hub

" * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school serie

78 Pith papers cite this work. Polarity classification is still indexing.

78 Pith papers citing it

browse 78 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

method 1

citation-polarity summary

use method 1

claims ledger

method It compares the model's prediction for the entire trajec- tory against the ground-truth trajectory label, Ltraj(s, a): Ltraj = LBCE Rϕ(s, a | x), Rtraj (12) where σ(·) denotes the sigmoid function, which converts the model's raw logit outputs into probabilities. LBCE(·, ·) denotes the BCE loss function. For a ground-truth label L ∈ { 0, 1} and a model logit output Rϕ, it is defined as LBCE(Rϕ, L) = −[L log σ(Rϕ) + (1− L) log(1 − σ(Rϕ))], By jointly optimizing this objective, Fin-PRM is train

co-cited works

representative citing papers

Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals

cs.AI · 2026-04-17 · unverdicted · novelty 7.0

Pairwise scoring signals in Vision Transformer token reduction are inherently unstable due to high perturbation counts and degrade in deep layers, causing collapse, while unary signals with triage enable CATIS to retain 96.9% accuracy at 63% FLOPs reduction on ViT-Large ImageNet-1K.

Dynamic Tool Dependency Retrieval for Lightweight Function Calling

cs.LG · 2025-12-18 · unverdicted · novelty 7.0

DTDR dynamically retrieves relevant tools by modeling dependencies from demonstrations and conditioning on the evolving agent plan, improving function calling success rates by 23-104% over static retrievers across benchmarks.

Incremental Data-Driven Policy Synthesis via Game Abstractions

cs.GT · 2025-11-14 · unverdicted · novelty 7.0

An incremental rank-lifting algorithm updates winning regions and policies in data-driven stochastic game abstractions by exploiting monotonic growth of under-approximations and shrinkage of over-approximations.

Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker

cs.CL · 2025-11-11 · unverdicted · novelty 7.0

UWE is a task-agnostic bi-encoder that uses many-to-many InfoNCE and token-level soft late interaction to achieve zero-shot ranking across unseen work-related target spaces while using far fewer parameters than Qwen3-8B and improving MAP by 4.4 points.

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

cs.AI · 2025-09-08 · conditional · novelty 7.0

MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

eess.AS · 2025-07-31 · unverdicted · novelty 7.0

MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.

VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation

cs.IR · 2025-07-29 · unverdicted · novelty 7.0

VoteGCL augments graph-based recommendation systems with high-confidence synthetic interactions generated via majority-voting LLM reranks and integrates them into graph contrastive learning to improve accuracy and reduce popularity bias.

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

cs.AI · 2025-07-21 · conditional · novelty 7.0

OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.

TRAM: Test-Time Risk Adaptation with Mixture of Agents

cs.LG · 2024-08-16 · unverdicted · novelty 7.0

TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.

Assessing How Hate, Counterspeech, and Toxicity Affect Hate Group Newcomers

cs.CY · 2024-05-28 · unverdicted · novelty 7.0

Counterspeech reduces the likelihood that hate-speech-using newcomers continue posting in hate subreddits, though toxic counterspeech raises the chance of continued hostility in the thread.

Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples

cs.NE · 2022-09-07 · unverdicted · novelty 7.0

MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.

When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.

Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

Introduces a four-axis difficulty taxonomy integrated into an enterprise RAG benchmark to systematically diagnose multi-dimensional challenges like reasoning complexity and retrieval difficulty.

Beyond Static: Related Questions Retrieval Through Conversations in Community Question Answering

cs.IR · 2026-03-09 · unverdicted · novelty 6.0

TeCQR retrieves related questions in cQA by generating tag-enhanced clarifying questions, using noise-tolerant semantic matching, and two-stage training to learn fine-grained representations of queries, questions, and tags.

STORM: Segment, Track, and Object Re-Localization from a Single Image

cs.CV · 2025-11-12 · unverdicted · novelty 6.0

STORM enables robust reference-conditioned 6D pose tracking from one image via hierarchical spatial fusion attention and a BCE-trained verifier that detects drift for automatic re-initialization.

Routing-Based Continual Learning for Multimodal Large Language Models

cs.LG · 2025-11-03 · unverdicted · novelty 6.0

Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.

Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization's Impact on VLMs Beyond Accuracy

cs.CV · 2025-09-25 · unverdicted · novelty 6.0 · 2 refs

Quantization of VLMs improves multiple reliability metrics beyond accuracy by damping high-rank spectral components and promoting reliance on robust low-rank features.

Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning

cs.AI · 2025-09-16 · unverdicted · novelty 6.0

PeCL applies token-level dynamic differential privacy and privacy-guided memory sculpting to achieve superior privacy-utility balance in continual learning.

LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios

cs.LG · 2025-09-12 · unverdicted · novelty 6.0

LoFT uses parameter-efficient fine-tuning of foundation models for long-tailed semi-supervised learning, supported by proofs that this reduces hypothesis complexity to minimize balanced posterior error and compresses outlier acceptance regions, with LoFT-OW handling open-world OOD cases.

Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal

cs.CL · 2025-09-07 · unverdicted · novelty 6.0

Sparse autoencoders plus greedy filtering and factorization-machine interaction modeling identify minimal sets of features in Gemma-2-2B-IT and LLaMA-3.1-8B-IT whose ablation produces jailbreaks by flipping refusal to compliance.

TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models

cs.LG · 2025-09-03 · unverdicted · novelty 6.0

TeRA parametrizes high-rank LLM weight updates via a random Tucker-like tensor network with shared frozen factors and layer-specific scaling vectors, matching high-rank adapter performance at vector-level parameter counts.

CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving

cs.CV · 2025-08-31 · unverdicted · novelty 6.0

CogDriver-Agent with sparse temporal memory and spatiotemporal distillation on CogDriver-Data achieves 22% higher closed-loop Driving Score on Bench2Drive and 21% lower mean L2 error on nuScenes.

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

cs.LG · 2025-08-27 · conditional · novelty 6.0

GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.

Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

cs.CL · 2025-08-21 · unverdicted · novelty 6.0

Fin-PRM is a domain-specialized process reward model that supplies binary step-level and trajectory-level supervision signals for financial reasoning in LLMs and outperforms general PRMs on CFLUE and FinQA benchmarks.

citing papers explorer

Showing 50 of 78 citing papers.

Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals cs.AI · 2026-04-17 · unverdicted · none · ref 1
Pairwise scoring signals in Vision Transformer token reduction are inherently unstable due to high perturbation counts and degrade in deep layers, causing collapse, while unary signals with triage enable CATIS to retain 96.9% accuracy at 63% FLOPs reduction on ViT-Large ImageNet-1K.
Dynamic Tool Dependency Retrieval for Lightweight Function Calling cs.LG · 2025-12-18 · unverdicted · none · ref 1
DTDR dynamically retrieves relevant tools by modeling dependencies from demonstrations and conditioning on the evolving agent plan, improving function calling success rates by 23-104% over static retrievers across benchmarks.
Incremental Data-Driven Policy Synthesis via Game Abstractions cs.GT · 2025-11-14 · unverdicted · none · ref 1
An incremental rank-lifting algorithm updates winning regions and policies in data-driven stochastic game abstractions by exploiting monotonic growth of under-approximations and shrinkage of over-approximations.
Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker cs.CL · 2025-11-11 · unverdicted · none · ref 1
UWE is a task-agnostic bi-encoder that uses many-to-many InfoNCE and token-level soft late interaction to achieve zero-shot ranking across unseen work-related target spaces while using far fewer parameters than Qwen3-8B and improving MAP by 4.4 points.
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents cs.AI · 2025-09-08 · conditional · none · ref 1
MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks eess.AS · 2025-07-31 · unverdicted · none · ref 1
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation cs.IR · 2025-07-29 · unverdicted · none · ref 1
VoteGCL augments graph-based recommendation systems with high-confidence synthetic interactions generated via majority-voting LLM reranks and integrates them into graph contrastive learning to improve accuracy and reduce popularity bias.
One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms cs.AI · 2025-07-21 · conditional · none · ref 1
OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.
TRAM: Test-Time Risk Adaptation with Mixture of Agents cs.LG · 2024-08-16 · unverdicted · none · ref 1
TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
Assessing How Hate, Counterspeech, and Toxicity Affect Hate Group Newcomers cs.CY · 2024-05-28 · unverdicted · none · ref 1
Counterspeech reduces the likelihood that hate-speech-using newcomers continue posting in hate subreddits, though toxic counterspeech raises the chance of continued hostility in the thread.
Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples cs.NE · 2022-09-07 · unverdicted · none · ref 41
MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.
When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation cs.LG · 2026-04-12 · unverdicted · none · ref 1
Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.
Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework cs.CL · 2026-04-03 · unverdicted · none · ref 1
Introduces a four-axis difficulty taxonomy integrated into an enterprise RAG benchmark to systematically diagnose multi-dimensional challenges like reasoning complexity and retrieval difficulty.
Beyond Static: Related Questions Retrieval Through Conversations in Community Question Answering cs.IR · 2026-03-09 · unverdicted · none · ref 1
TeCQR retrieves related questions in cQA by generating tag-enhanced clarifying questions, using noise-tolerant semantic matching, and two-stage training to learn fine-grained representations of queries, questions, and tags.
STORM: Segment, Track, and Object Re-Localization from a Single Image cs.CV · 2025-11-12 · unverdicted · none · ref 1
STORM enables robust reference-conditioned 6D pose tracking from one image via hierarchical spatial fusion attention and a BCE-trained verifier that detects drift for automatic re-initialization.
Routing-Based Continual Learning for Multimodal Large Language Models cs.LG · 2025-11-03 · unverdicted · none · ref 1
Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization's Impact on VLMs Beyond Accuracy cs.CV · 2025-09-25 · unverdicted · none · ref 1 · 2 links
Quantization of VLMs improves multiple reliability metrics beyond accuracy by damping high-rank spectral components and promoting reliance on robust low-rank features.
Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning cs.AI · 2025-09-16 · unverdicted · none · ref 1
PeCL applies token-level dynamic differential privacy and privacy-guided memory sculpting to achieve superior privacy-utility balance in continual learning.
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios cs.LG · 2025-09-12 · unverdicted · none · ref 1
LoFT uses parameter-efficient fine-tuning of foundation models for long-tailed semi-supervised learning, supported by proofs that this reduces hypothesis complexity to minimize balanced posterior error and compresses outlier acceptance regions, with LoFT-OW handling open-world OOD cases.
Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal cs.CL · 2025-09-07 · unverdicted · none · ref 1
Sparse autoencoders plus greedy filtering and factorization-machine interaction modeling identify minimal sets of features in Gemma-2-2B-IT and LLaMA-3.1-8B-IT whose ablation produces jailbreaks by flipping refusal to compliance.
TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models cs.LG · 2025-09-03 · unverdicted · none · ref 1
TeRA parametrizes high-rank LLM weight updates via a random Tucker-like tensor network with shared frozen factors and layer-specific scaling vectors, matching high-rank adapter performance at vector-level parameter counts.
CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving cs.CV · 2025-08-31 · unverdicted · none · ref 54
CogDriver-Agent with sparse temporal memory and spatiotemporal distillation on CogDriver-Data achieves 22% higher closed-loop Driving Score on Bench2Drive and 21% lower mean L2 error on nuScenes.
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs cs.LG · 2025-08-27 · conditional · none · ref 1
GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.
Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models cs.CL · 2025-08-21 · unverdicted · none · ref 1
Fin-PRM is a domain-specialized process reward model that supplies binary step-level and trajectory-level supervision signals for financial reasoning in LLMs and outperforms general PRMs on CFLUE and FinQA benchmarks.
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models cs.AI · 2025-07-30 · unverdicted · none · ref 51
League of LLMs organizes LLMs into a self-governed mutual evaluation league using dynamic, transparent, objective, and professional criteria to distinguish model capabilities with 70.7% top-k ranking stability.
PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking cs.CR · 2025-07-29 · unverdicted · none · ref 1
PRISM decomposes harmful instructions into benign visual gadgets and directs LVLMs via prompts to compose them through reasoning into harmful outputs, achieving ASR over 0.90 on SafeBench.
The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models cs.LG · 2025-07-25 · unverdicted · none · ref 1
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.
Perception-Aware Policy Optimization for Multimodal Reasoning cs.CL · 2025-07-08 · unverdicted · none · ref 1
PAPO integrates perception-aware supervision via a KL-based loss into RLVR methods like GRPO, yielding 4.4-17.5% gains on multimodal benchmarks and 30.5% fewer perception errors, with larger gains on vision-heavy tasks.
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription cs.LG · 2025-02-27 · unverdicted · none · ref 1
Introduces OCR+PAGE-1 and OCR+PAGE-N prompting strategies that improve zero-shot multi-page handwritten document transcription by sharing context across pages.
ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation cs.LG · 2025-02-25 · unverdicted · none · ref 1
ExPath is a subgraph inference framework that classifies bio-networks with experimental data and uses explanations to identify targeted pathways, reporting up to 4.5x higher Fidelity+ and 14x lower Fidelity- than baselines on 301 networks.
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation cs.IR · 2025-02-14 · unverdicted · none · ref 1
ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation cs.CV · 2024-12-10 · unverdicted · none · ref 1
Motion-aware contrastive learning on mask tubes improves temporal panoptic scene graph generation over pooling-based methods on video and 4D datasets.
Multi-Scale Contrastive Learning for Video Temporal Grounding cs.CV · 2024-12-10 · unverdicted · none · ref 1
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
Explainable Representation of Finite-Memory Policies for POMDPs using Decision Trees cs.AI · 2024-11-20 · unverdicted · none · ref 1
A translation method converts finite-state-controller policies for POMDPs into a decision-tree-plus-Mealy-machine form that is typically smaller and more explainable, with further simplifications for attractor-based policies.
SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks cs.LG · 2024-11-19 · unverdicted · none · ref 55
SkillTree reduces continuous action spaces to discrete skills via a differentiable decision tree in a hierarchical policy, achieving comparable performance to neural skill methods with added skill-level explainability in robotic arm tasks.
Bayesian Inverse Transition Learning: Learning Dynamics From Near-Optimal Trajectories cs.LG · 2024-11-07 · unverdicted · none · ref 1
A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.
Uncovering the Internet's Hidden Values: An Empirical Study of Desirable Behavior Using Highly-Upvoted Content on Reddit cs.HC · 2024-10-16 · unverdicted · none · ref 1
LLM analysis of highly-upvoted Reddit comments yields 64-72 macro/meso/micro values per year; existing prosocial measures capture only 18% on average while the method also recovers and extends prior qualitative taxonomies.
MVIGER: Multi-View Variational Integration of Complementary Knowledge for Generative Recommender cs.IR · 2024-08-16 · unverdicted · none · ref 1
MVIGER integrates complementary knowledge from diverse prompts and indices in generative recommenders via a variational model with learnable prior over latent sources, showing superior performance on three datasets.
LaMSUM: Amplifying Voices Against Harassment through LLM Guided Extractive Summarization of User Incident Reports cs.CL · 2024-06-22 · unverdicted · none · ref 1
LaMSUM is a novel multi-level LLM framework with voting methods for extractive summarization of large incident report collections that outperforms prior extractive methods.
READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling cs.CV · 2023-12-12 · unverdicted · none · ref 34
READ recurrent adapters with partial video-language alignment via optimal transport outperform standard fine-tuning on low-resource temporal grounding and summarization tasks.
Studying Lobby Influence in the European Parliament cs.CL · 2023-09-20 · unverdicted · none · ref 1
NLP comparison of lobby papers and MEP speeches discovers influence links validated indirectly via retweets and meetings, achieving AUC 0.77 and ideological alignment in aggregate analysis.
Image Captioning via Compact Bidirectional Architecture cs.CV · 2022-01-06 · unverdicted · none · ref 1
Compact bidirectional transformer integrates L2R and R2L flows with sentence-level ensemble and two-flow self-critical training to achieve SOTA on MSCOCO without vision-language pretraining.
End-to-end PDDL Planning with Hardcoded and Dynamic Agents cs.AI · 2025-12-10 · unverdicted · none · ref 1
An end-to-end LLM framework refines natural language into valid PDDL domains and problems via hardcoded and dynamic agents, generates plans with standard engines, and returns readable output.
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering cs.SD · 2025-11-12 · unverdicted · none · ref 1
CLSR is an end-to-end contrastive language-speech retriever using an intermediate text-like conversion step to improve retrieval of relevant segments from long audio for spoken question answering.
COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing cs.LG · 2025-11-10 · unverdicted · none · ref 1
COGNOS improves reconstruction-based time series anomaly detection by enforcing Gaussian white noise residuals through training regularization and applying adaptive Kalman smoothing to produce more stable anomaly scores.
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains cs.LG · 2025-11-09 · unverdicted · none · ref 1
MULTIBENCH++ is a new large-scale benchmark integrating over 30 datasets across 15 modalities and 20 tasks, accompanied by an open-source automated evaluation pipeline that establishes new performance baselines for multimodal fusion.
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning cs.RO · 2025-11-09 · unverdicted · none · ref 1
A two-stage distillation plus reinforced fine-tuning approach produces a single humanoid locomotion controller that adapts across skills and irregular terrains.
Bridging the phenotype-target gap for molecular generation via multi-objective reinforcement learning cs.LG · 2025-09-25 · unverdicted · none · ref 1
SmilesGEN uses dual VAEs to jointly model drug structures and transcriptional responses, generating molecules with higher validity, novelty, and similarity to known ligands than prior methods.
Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data cs.SI · 2025-09-23 · conditional · none · ref 1
LLM agents calibrated on Italian election data produce coherent posts and realistic network structure but show less tone and toxicity variation than real users, with opinion changes resembling traditional mathematical models.
LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization cs.CL · 2025-09-21 · unverdicted · none · ref 1
LifeAlign uses focalized preference optimization and short-to-long memory consolidation via dimensionality reduction to let LLMs align with new preferences while retaining prior knowledge.

" * write output.state after.block = add.period write newline

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer