Title resolution pending

Brown, T · 1901

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

WinDeskGround is a parametrically generated benchmark of 1,356 instruction-target pairs that reveals accuracy declines in state-of-the-art MLLMs under partial occlusion in multi-window GUI settings.

Discrete Prototypical Memories for Federated Time Series Foundation Models

cs.LG · 2026-04-06 · unverdicted · novelty 7.0

FeDPM learns and aligns local discrete prototypical memories across domains to create a unified discrete latent space for LLM-based time series foundation models in a federated setting.

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving

cs.DC · 2026-03-26 · unverdicted · novelty 7.0

GhostServe applies erasure coding to KV cache in host memory for fast recovery from failures in LLM serving, cutting checkpointing latency up to 2.7x and recovery latency 2.1x versus prior methods.

On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency

cs.LG · 2026-01-29 · unverdicted · novelty 7.0

Parallel thinking in LLMs suffers from overscaling where fixed global budgets waste samples; LanBo predicts per-sample budgets from latent states to raise utilization without hurting accuracy.

Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads

cs.LG · 2026-01-29 · unverdicted · novelty 7.0

A renewal-reward analysis yields a closed-form mean-field rule for the optimal Attention/FFN provisioning ratio in disaggregated LLM serving that accounts for stochastic KV-cache growth and matches simulation optima within 10%.

Mitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

Diagnoses mask prior drift and positional attention collapse in LDVLMs and introduces two plug-and-play decoding interventions that raise long-form generation quality without retraining.

Anon: Extrapolating Adaptivity Beyond SGD and Adam

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

Improving Zero-Shot Offline RL via Behavioral Task Sampling

cs.AI · 2026-04-28 · unverdicted · novelty 6.0

Extracting task vectors from the offline dataset for policy training improves zero-shot offline RL performance by an average of 20% over random sampling baselines.

Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

Dataset-level metrics in diffusion language models mask substantial sample-level non-determinism that varies with model and system factors, which a new Factor Variance Attribution metric can decompose.

BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation

cs.NE · 2026-04-14 · unverdicted · novelty 6.0

BiSpikCLM is the first fully binary spiking MatMul-free causal language model that matches ANN performance on generation tasks using only 4-6 percent of the compute via softmax-free spiking attention and spike-aware distillation.

A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks

cs.LG · 2026-03-23 · unverdicted · novelty 6.0 · 2 refs

iAmTime is a time-series foundation model that uses instruction-conditioned in-context learning from demonstrations to perform zero-shot adaptation on forecasting, imputation, classification, and related tasks.

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

cs.LG · 2026-02-08 · unverdicted · novelty 6.0

SiameseNorm is a two-stream architecture that reconciles Pre-Norm and Post-Norm in Transformers by coupling streams via shared residual blocks, yielding performance gains with maintained stability on language, vision, and diffusion models.

SPOT: Selective Prompt Projection via Total Variation for Inference-Only Safe Text-to-Image Generation

cs.AI · 2026-01-31 · unverdicted · novelty 6.0

SPOT projects prompts to a tau-safe set via total variation to cut inappropriate content 14-44% relative to baselines while preserving benign prompt behavior in frozen T2I models.

The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning

cs.LG · 2026-01-25 · unverdicted · novelty 6.0

TGR performs manifold-informed latent foresight search to boost trajectory coverage in long-context reasoning tasks by up to 13 AUC points with minimal overhead.

BOOST: BOttleneck-Optimized Scalable Training Framework for Low-Rank Large Language Models

cs.LG · 2025-12-13 · unverdicted · novelty 6.0

BOOST delivers 1.46-2.27x end-to-end speedups for low-rank bottleneck LLMs by redesigning tensor parallelism around the bottleneck structure plus supporting optimizations.

AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting

cs.AI · 2025-11-12 · conditional · novelty 6.0

AlphaCast is a training-free LLM framework that performs interactive multi-stage reasoning for time series forecasting by integrating feature extraction, knowledge bases, case libraries, and contextual pools.

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

cs.CV · 2024-01-29 · conditional · novelty 6.0

MoE-LLaVA applies mixture-of-experts sparsity to LVLMs via MoE-Tuning, delivering LLaVA-1.5-7B level visual understanding and better hallucination resistance with only ~3B active parameters.

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

cs.CL · 2023-06-01 · unverdicted · novelty 6.0

Properly filtered web data from CommonCrawl alone trains LLMs that significantly outperform models trained on The Pile, with 600 billion tokens and 1.3B/7.5B parameter models released.

PaLM-E: An Embodied Multimodal Language Model

cs.LG · 2023-03-06 · conditional · novelty 6.0

PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive transfer from joint training on language and robotics data.

FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation

cs.LG · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.

DRAFT: Task Decoupled Latent Reasoning for Agent Safety

cs.LG · 2026-02-11 · unverdicted · novelty 5.0

DRAFT decouples agent safety judgment into latent extraction and reasoning stages, raising average benchmark accuracy from 63.27% to 91.18%.

Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective

cs.AI · 2025-11-01 · conditional · novelty 5.0

The paper analyzes CPU bottlenecks in agentic AI serving, selects representative workloads, and demonstrates that CPU-aware scheduling optimizations COMB and MAS can reduce P50 latency by up to 1.7x and total latency by up to 2.49x on two hardware systems.

Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining

cs.LG · 2024-03-29 · unverdicted · novelty 5.0

Table-specific pretraining of Llama-2 yields significant gains on zero-shot, few-shot, and in-context tabular prediction tasks over prior benchmarks.

Scalable Reinforcement Learning via Adaptive Batch Scaling

stat.ML · 2026-05-20

citing papers explorer

Showing 27 of 27 citing papers.

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments cs.CV · 2026-05-13 · unverdicted · none · ref 3
WinDeskGround is a parametrically generated benchmark of 1,356 instruction-target pairs that reveals accuracy declines in state-of-the-art MLLMs under partial occlusion in multi-window GUI settings.
Discrete Prototypical Memories for Federated Time Series Foundation Models cs.LG · 2026-04-06 · unverdicted · none · ref 3
FeDPM learns and aligns local discrete prototypical memories across domains to create a unified discrete latent space for LLM-based time series foundation models in a federated setting.
GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving cs.DC · 2026-03-26 · unverdicted · none · ref 4
GhostServe applies erasure coding to KV cache in host memory for fast recovery from failures in LLM serving, cutting checkpointing latency up to 2.7x and recovery latency 2.1x versus prior methods.
On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency cs.LG · 2026-01-29 · unverdicted · none · ref 4
Parallel thinking in LLMs suffers from overscaling where fixed global budgets waste samples; LanBo predicts per-sample budgets from latent states to raise utilization without hurting accuracy.
Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads cs.LG · 2026-01-29 · unverdicted · none · ref 1
A renewal-reward analysis yields a closed-form mean-field rule for the optimal Attention/FFN provisioning ratio in disaggregated LLM serving that accounts for stochastic KV-cache growth and matches simulation optima within 10%.
Mitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language Models cs.CV · 2026-05-14 · unverdicted · none · ref 3 · 2 links
Diagnoses mask prior drift and positional attention collapse in LDVLMs and introduces two plug-and-play decoding interventions that raise long-form generation quality without retraining.
Anon: Extrapolating Adaptivity Beyond SGD and Adam cs.AI · 2026-05-04 · unverdicted · none · ref 1
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.
Improving Zero-Shot Offline RL via Behavioral Task Sampling cs.AI · 2026-04-28 · unverdicted · none · ref 2
Extracting task vectors from the offline dataset for policy training improves zero-shot offline RL performance by an average of 20% over random sampling baselines.
Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models cs.LG · 2026-04-15 · unverdicted · none · ref 4
Dataset-level metrics in diffusion language models mask substantial sample-level non-determinism that varies with model and system factors, which a new Factor Variance Attribution metric can decompose.
BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation cs.NE · 2026-04-14 · unverdicted · none · ref 2
BiSpikCLM is the first fully binary spiking MatMul-free causal language model that matches ANN performance on generation tasks using only 4-6 percent of the compute via softmax-free spiking attention and spike-aware distillation.
A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks cs.LG · 2026-03-23 · unverdicted · none · ref 5 · 2 links
iAmTime is a time-series foundation model that uses instruction-conditioned in-context learning from demonstrations to perform zero-shot adaptation on forecasting, imputation, classification, and related tasks.
SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm cs.LG · 2026-02-08 · unverdicted · none · ref 3
SiameseNorm is a two-stream architecture that reconciles Pre-Norm and Post-Norm in Transformers by coupling streams via shared residual blocks, yielding performance gains with maintained stability on language, vision, and diffusion models.
SPOT: Selective Prompt Projection via Total Variation for Inference-Only Safe Text-to-Image Generation cs.AI · 2026-01-31 · unverdicted · none · ref 3
SPOT projects prompts to a tau-safe set via total variation to cut inappropriate content 14-44% relative to baselines while preserving benign prompt behavior in frozen T2I models.
The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning cs.LG · 2026-01-25 · unverdicted · none · ref 5
TGR performs manifold-informed latent foresight search to boost trajectory coverage in long-context reasoning tasks by up to 13 AUC points with minimal overhead.
BOOST: BOttleneck-Optimized Scalable Training Framework for Low-Rank Large Language Models cs.LG · 2025-12-13 · unverdicted · none · ref 3
BOOST delivers 1.46-2.27x end-to-end speedups for low-rank bottleneck LLMs by redesigning tensor parallelism around the bottleneck structure plus supporting optimizations.
AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting cs.AI · 2025-11-12 · conditional · none · ref 3
AlphaCast is a training-free LLM framework that performs interactive multi-stage reasoning for time series forecasting by integrating feature extraction, knowledge bases, case libraries, and contextual pools.
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models cs.CV · 2024-01-29 · conditional · none · ref 3
MoE-LLaVA applies mixture-of-experts sparsity to LVLMs via MoE-Tuning, delivering LLaVA-1.5-7B level visual understanding and better hallucination resistance with only ~3B active parameters.
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only cs.CL · 2023-06-01 · unverdicted · none · ref 14
Properly filtered web data from CommonCrawl alone trains LLMs that significantly outperform models trained on The Pile, with 600 billion tokens and 1.3B/7.5B parameter models released.
PaLM-E: An Embodied Multimodal Language Model cs.LG · 2023-03-06 · conditional · none · ref 5
PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive transfer from joint training on language and robotics data.
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation cs.LG · 2026-05-06 · unverdicted · none · ref 3 · 2 links
FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.
DRAFT: Task Decoupled Latent Reasoning for Agent Safety cs.LG · 2026-02-11 · unverdicted · none · ref 2
DRAFT decouples agent safety judgment into latent extraction and reasoning stages, raising average benchmark accuracy from 63.27% to 91.18%.
Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective cs.AI · 2025-11-01 · conditional · none · ref 5
The paper analyzes CPU bottlenecks in agentic AI serving, selects representative workloads, and demonstrates that CPU-aware scheduling optimizations COMB and MAS can reduce P50 latency by up to 1.7x and total latency by up to 2.49x on two hardware systems.
Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining cs.LG · 2024-03-29 · unverdicted · none · ref 1
Table-specific pretraining of Llama-2 yields significant gains on zero-shot, few-shot, and in-context tabular prediction tasks over prior benchmarks.
Scalable Reinforcement Learning via Adaptive Batch Scaling stat.ML · 2026-05-20 · unreviewed · ref 3
CalM: A Self-Supervised Foundation Model for Population Dynamics in Calcium Imaging Data q-bio.QM · 2026-04-03 · unreviewed · ref 4
The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning eess.AS · 2026-03-18 · unreviewed · ref 4 · 2 links
HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench cs.LG · 2026-01-28 · unreviewed · ref 2

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer