Title resolution pending

Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole , booktitle= · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Surprisal Minimisation over Goal-directed Alternatives Predicts Production Choice in Dialogue

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

Surprisal minimization over goal-directed alternatives generated by language models provides the strongest account of production choices in open-ended dialogue compared to uniform information density or length-based costs.

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

cs.LG · 2024-02-27 · unverdicted · novelty 7.0

HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

cs.CL · 2026-04-06 · unverdicted · novelty 5.0 · 2 refs

ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.

citing papers explorer

Showing 3 of 3 citing papers.

Surprisal Minimisation over Goal-directed Alternatives Predicts Production Choice in Dialogue cs.CL · 2026-05-01 · unverdicted · none · ref 36
Surprisal minimization over goal-directed alternatives generated by language models provides the strongest account of production choices in open-ended dialogue compared to uniform information density or length-based costs.
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations cs.LG · 2024-02-27 · unverdicted · none · ref 22
HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning cs.CL · 2026-04-06 · unverdicted · none · ref 23 · 2 links
ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer