PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
Mixed citations
Warren, Lu Cheng, Haidar M
Mixed citation behavior. Most common role is background (62%).
citation-role summary
citation-polarity summary
representative citing papers
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
Forest proximities admit an exact sparse factorization via separable weighted leaf-collision kernels that reduces computation to sparse linear algebra over leaf collisions.
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.
Macro uses Direct Preference Optimization on composite-scored preference pairs to improve validity of multilingual self-generated counterfactual explanations by 12.55% on average without degrading minimality.
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
NEURON raises AUC from 0.74-0.77 to 0.84-0.88 on MIMIC-IV heart-failure mortality prediction while lifting human-aligned explanation scores from 0.50 to 0.85 by grounding SHAP values in SNOMED CT and patient notes via RAG-LLM.
Methods for constructing Hypergraphs of Text are proposed with a new effort ratio metric where TF-IDF baselines match LLM methods in experiments.
A new catalog classifying 35 data error types into missing, incorrect, and redundant categories for tabular data, with definitions and examples to improve data quality management.
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
Fine-tuned LLMs produce plausible counterfactuals for health interventions and recover 20% F1 via data augmentation in label-scarce sensor datasets.
The paper introduces the InsideOut benchmark to quantify insider-outsider bias in LLM-generated interview scripts across 10 cultures and shows that multi-agent mitigation frameworks substantially reduce the bias on metrics like Cultural Alignment Gap.
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
Context-mediated domain adaptation treats user modifications to AI artifacts as implicit domain specifications that reshape LLM-powered multi-agent reasoning, demonstrated via the Seedentia system which extracted 46 domain knowledge entries from expert edits.
Authors propose a four-stage framework to analyze opportunities and risks of generative AI across the health information journey from public sources to clinical care.
A review synthesizes evidence from EEG, EMG, ECG, PPG and ocular signals to argue that waveform morphology, rather than modality or model class, primarily determines TSC performance and interpretability.
RAG is more effective and cost-efficient than fine-tuning for industrial QA adaptation on automotive datasets.
Proposes autopoietic architectures for self-constructing software as a fundamental shift in the SDLC, leveraging foundation models for autonomous evolution and maintenance.
citing papers explorer
-
PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
-
Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
-
Revisiting Forest Proximities via Sparse Leaf-Incidence Kernels
Forest proximities admit an exact sparse factorization via separable weighted leaf-collision kernels that reduces computation to sparse linear algebra over leaf collisions.
-
AlphaEvolve: A coding agent for scientific and algorithmic discovery
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
-
PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries
PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.
-
Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization
Macro uses Direct Preference Optimization on composite-scored preference pairs to improve validity of multilingual self-generated counterfactual explanations by 12.55% on average without degrading minimality.
-
Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
-
NEURON: A Neuro-symbolic System for Grounded Clinical Explainability
NEURON raises AUC from 0.74-0.77 to 0.84-0.88 on MIMIC-IV heart-failure mortality prediction while lifting human-aligned explanation scores from 0.50 to 0.85 by grounding SHAP values in SNOMED CT and patient notes via RAG-LLM.
-
Make Any Collection Navigable: Methods for Constructing and Evaluating Hypergraph of Text
Methods for constructing Hypergraphs of Text are proposed with a new effort ratio metric where TF-IDF baselines match LLM methods in experiments.
-
A Catalog of Data Errors
A new catalog classifying 35 data error types into missing, incorrect, and redundant categories for tabular data, with definitions and examples to improve data quality management.
-
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
-
Counterfactual Modeling with Fine-Tuned LLMs for Health Intervention Design and Sensor Data Augmentation
Fine-tuned LLMs produce plausible counterfactuals for health interventions and recover 20% F1 via data augmentation in label-scarce sensor datasets.
-
InsideOut: Measuring and Mitigating Insider-Outsider Bias in Interview Script Generation
The paper introduces the InsideOut benchmark to quantify insider-outsider bias in LLM-generated interview scripts across 10 cultures and shows that multi-agent mitigation frameworks substantially reduce the bias on metrics like Cultural Alignment Gap.
-
AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
-
Context-Mediated Domain Adaptation in Multi-Agent Sensemaking Systems
Context-mediated domain adaptation treats user modifications to AI artifacts as implicit domain specifications that reshape LLM-powered multi-agent reasoning, demonstrated via the Seedentia system which extracted 46 domain knowledge entries from expert edits.
-
Opportunities and Risks of Generative AI through the Health Information Journey
Authors propose a four-stage framework to analyze opportunities and risks of generative AI across the health information journey from public sources to clinical care.
-
Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals
A review synthesizes evidence from EEG, EMG, ECG, PPG and ocular signals to argue that waveform morphology, rather than modality or model class, primarily determines TSC performance and interpretability.
-
Assessment of RAG and Fine-Tuning for Industrial Question-Answering-Applications
RAG is more effective and cost-efficient than fine-tuning for industrial QA adaptation on automotive datasets.
-
Towards Enabling An Artificial Self-Construction Software Life-cycle via Autopoietic Architectures
Proposes autopoietic architectures for self-constructing software as a fundamental shift in the SDLC, leveraging foundation models for autonomous evolution and maintenance.
- LLM Harms: A Taxonomy and Discussion