Recognition: unknown
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
read the original abstract
Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the likelihood of each token in the sequence given the current (recurrent) state and the previous token. At inference, the unknown previous token is then replaced by a token generated by the model itself. This discrepancy between training and inference can yield errors that can accumulate quickly along the generated sequence. We propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead. Experiments on several sequence prediction tasks show that this approach yields significant improvements. Moreover, it was used successfully in our winning entry to the MSCOCO image captioning challenge, 2015.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs
UniVLR unifies textual and visual reasoning in multimodal LLMs by compressing reasoning traces and auxiliary images into visual latent tokens for direct inference without interleaved text CoT.
-
TimeLesSeg: Unified Contrast-Agnostic Cross-Sectional and Longitudinal MS Lesion Segmentation via a Stochastic Generative Model
TimeLesSeg delivers a unified contrast-agnostic CNN for MS lesion segmentation that seamlessly handles both cross-sectional and longitudinal inputs by combining empty prior masks with stochastic morphological deformat...
-
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout ...
-
Graph Neural ODE Digital Twins for Control-Oriented Reactor Thermal-Hydraulic Forecasting Under Partial Observability
A GNN-ODE digital twin forecasts reactor thermal-hydraulic states under partial observability, achieving low error on held-out transients and recovering a physical heat-transfer correlation during sim-to-real adaptation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.