SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechanics disambiguation cases.
(2019, July)
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 1polarities
background 1representative citing papers
A systematic method leveraging Weisfeiler-Leman coloring to mine class-discriminating motifs as proxy explanations, enabling the creation of the OpenGraphXAI benchmark suite from real-world datasets.
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
MirrorBench defines a reproducible benchmark combining lexical metrics (MATTR, Yule's K, HD-D) and LLM-judge metrics with calibration controls to measure human-likeness of user-proxy agents across four datasets.
TextBridgeGNN pre-trains GNNs using text-guided hierarchical propagation to enable effective cross-domain knowledge transfer in recommendations.
DMICF models interactions from user- and item-centric perspectives with a macro-micro prototype-aware variational encoder and dimension-wise intent alignment to improve collaborative filtering.
citing papers explorer
-
SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science
SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechanics disambiguation cases.
-
A method for the systematic generation of graph XAI benchmarks via Weisfeiler-Leman coloring
A systematic method leveraging Weisfeiler-Leman coloring to mine class-discriminating motifs as proxy explanations, enabling the creation of the OpenGraphXAI benchmark suite from real-world datasets.
-
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
-
Efficient Retrieval Scaling with Hierarchical Indexing for Large Scale Recommendation
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
-
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
-
MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness
MirrorBench defines a reproducible benchmark combining lexical metrics (MATTR, Yule's K, HD-D) and LLM-judge metrics with calibration controls to measure human-likeness of user-proxy agents across four datasets.
-
TextBridgeGNN: Pre-training Graph Neural Network for Cross-Domain Recommendation via Text-Guided Transfer
TextBridgeGNN pre-trains GNNs using text-guided hierarchical propagation to enable effective cross-domain knowledge transfer in recommendations.
-
Dual-Perspective Disentangled Multi-Intent Alignment for Enhanced Collaborative Filtering
DMICF models interactions from user- and item-centric perspectives with a macro-micro prototype-aware variational encoder and dimension-wise intent alignment to improve collaborative filtering.