Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.
hub
Hierarchical neural story generation.CoRR, abs/1805.04833
22 Pith papers cite this work. Polarity classification is still indexing.
abstract
We explore story generation: creative systems that can build coherent and fluent passages of text about a topic. We collect a large dataset of 300K human-written stories paired with writing prompts from an online forum. Our dataset enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text. We gain further improvements with a novel form of model fusion that improves the relevance of the story to the prompt, and adding a new gated multi-scale self-attention mechanism to model long-range context. Experiments show large improvements over strong baselines on both automated and human evaluations. Human judges prefer stories generated by our approach to those from a strong non-hierarchical model by a factor of two to one.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Top-W applies Wasserstein-regularized truncation on token-embedding geometry to create a closed-form optimal crop for LLM sampling that outperforms prior methods by up to 33.7% on GSM8K, GPQA, AlpacaEval, and MT-Bench.
Top-H decoding is a computationally efficient greedy algorithm for an entropy-constrained mass maximization problem that improves the creativity-coherence trade-off over min-p sampling in LLM text generation.
SafeDec uses constrained decoding to ensure autoregressive robot navigation foundation models generate actions that provably satisfy STL safety specifications under assumed dynamics.
The authors generate and publicly release the first large-scale open dataset of three million structured moral fables produced by small open language models together with a reproducible LLM-judge evaluation pipeline.
Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
Off-policy training data for LLM behavior probes causes significant generalization failures especially for intent-based behaviors like deception, and performance on coerced incentivised data correlates with real on-policy success.
CARRIAGE is a RAG framework that improves output diversity in cross-cultural recipe adaptation by enhancing retrieval and context handling, reaching Pareto efficiency on diversity and quality versus closed-book LLMs.
CTRL is a large conditional transformer language model that uses naturally occurring control codes to steer text generation style and content.
Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.
UAG is a universal avoidance generation method that increases multi-branch diversity in diffusion and transformer models by penalizing output similarity, delivering up to 1.9x higher diversity with 4.4x speed and 1/64th the FLOPs of prior methods.
Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.
Persona axes derived from contrastive prompts and PCA yield linear probes that generalize better than raw-activation probes across 10 datasets for deception and sycophancy.
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
Proposes the CSI framework for co-designing visual interactions and deep learning models to expose and allow semantic control over intermediate reasoning processes, shown in a summarization case study.
APCD adaptively branches LLM decoding paths based on token entropy and contrasts divergent paths to improve factual accuracy while preserving efficiency.
An AST pattern-matching prototype with a custom DSL achieves 0.74 average F1-score on a BigCloneEval subset, outperforming CodeLlama (0.35) and code clone detectors (best recall 0.20).
WriterForcing combines keyphrase attention and non-generic word promotion in Seq2Seq models to produce more diverse and interesting story endings.
Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.
Sentence-level models outperform skeleton-based approaches for narrative coherence despite a new SSN network improving on cosine and Euclidean baselines.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
Improving Sampling for Masked Diffusion Models via Information Gain
Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.
-
Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models
Top-W applies Wasserstein-regularized truncation on token-embedding geometry to create a closed-form optimal crop for LLM sampling that outperforms prior methods by up to 33.7% on GSM8K, GPQA, AlpacaEval, and MT-Bench.
-
Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation
Top-H decoding is a computationally efficient greedy algorithm for an entropy-constrained mass maximization problem that improves the creativity-coherence trade-off over min-p sampling in LLM text generation.
-
Constrained Decoding for Safe Robot Navigation Foundation Models
SafeDec uses constrained decoding to ensure autoregressive robot navigation foundation models generate actions that provably satisfy STL safety specifications under assumed dynamics.
-
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
The authors generate and publicly release the first large-scale open dataset of three million structured moral fables produced by small open language models together with a reproducible LLM-judge evaluation pipeline.
-
Chronos: Learning the Language of Time Series
Chronos pretrains transformer models on tokenized time series to deliver strong zero-shot forecasting across diverse domains.
-
The Impact of Off-Policy Training Data on Probe Generalisation
Off-policy training data for LLM behavior probes causes significant generalization failures especially for intent-based behaviors like deception, and performance on coerced incentivised data correlates with real on-policy success.
-
Culinary Crossroads: A RAG Framework for Enhancing Diversity in Cross-Cultural Recipe Adaptation
CARRIAGE is a RAG framework that improves output diversity in cross-cultural recipe adaptation by enhancing retrieval and context handling, reaching Pareto efficiency on diversity and quality versus closed-book LLMs.
-
CTRL: A Conditional Transformer Language Model for Controllable Generation
CTRL is a large conditional transformer language model that uses naturally occurring control codes to steer text generation style and content.
-
Annotations Mitigate Post-Training Mode Collapse
Annotation-anchored training reduces semantic diversity collapse in post-trained language models by a factor of six compared to standard supervised fine-tuning while preserving instruction-following and improving with scale.
-
A Universal Avoidance Method for Diverse Multi-branch Generation
UAG is a universal avoidance generation method that increases multi-branch diversity in diffusion and transformer models by penalizing output similarity, delivering up to 1.9x higher diversity with 4.4x speed and 1/64th the FLOPs of prior methods.
-
Narrix: Remixing Narrative Strategies from Examples for Story Writing
Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.
-
Do Linear Probes Generalize Better in Persona Coordinates?
Persona axes derived from contrastive prompts and PCA yield linear probes that generalize better than raw-activation probes across 10 datasets for deception and sycophancy.
-
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
-
Visual Interaction with Deep Learning Models through Collaborative Semantic Inference
Proposes the CSI framework for co-designing visual interactions and deep learning models to expose and allow semantic control over intermediate reasoning processes, shown in a summarization case study.
-
APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation
APCD adaptively branches LLM decoding paths based on token entropy and contrasts divergent paths to improve factual accuracy while preserving efficiency.
-
Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition
An AST pattern-matching prototype with a custom DSL achieves 0.74 average F1-score on a BigCloneEval subset, outperforming CodeLlama (0.35) and code clone detectors (best recall 0.20).
-
WriterForcing: Generating more interesting story endings
WriterForcing combines keyphrase attention and non-generic word promotion in Seq2Seq models to produce more diverse and interesting story endings.
-
Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition
Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.
-
Skeleton-based Coherence Modeling in Narratives
Sentence-level models outperform skeleton-based approaches for narrative coherence despite a new SSN network improving on cosine and Euclidean baselines.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
- How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework