pith. sign in

arxiv: 1601.06759 · v3 · pith:RJBPYNHNnew · submitted 2016-01-25 · 💻 cs.CV · cs.LG· cs.NE

Pixel Recurrent Neural Networks

classification 💻 cs.CV cs.LGcs.NE
keywords imagerecurrentdeepimagesmodelnaturalnetworksneural
0
0 comments X
read the original abstract

Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. WaveNet: A Generative Model for Raw Audio

    cs.SD 2016-09 accept novelty 9.0

    WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

  2. Density estimation using Real NVP

    cs.LG 2016-05 accept novelty 8.0

    Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

  3. High-Resolution Image Synthesis with Latent Diffusion Models

    cs.CV 2021-12 conditional novelty 7.0

    Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrai...

  4. Scaling Laws for Autoregressive Generative Modeling

    cs.LG 2020-10 accept novelty 7.0

    Autoregressive transformers follow power-law scaling laws for cross-entropy loss with nearly universal exponents relating optimal model size to compute budget across four domains.

  5. XLNet: Generalized Autoregressive Pretraining for Language Understanding

    cs.CL 2019-06 accept novelty 7.0

    XLNet is a generalized autoregressive pretraining method that learns bidirectional contexts via permutation-based factorization and outperforms BERT on 20 NLP tasks.

  6. Generating Long Sequences with Sparse Transformers

    cs.LG 2019-04 unverdicted novelty 7.0

    Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

  7. SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    SRC-Flow compresses RAE features into a low-dimensional semantic space with a Semantic Representation Compressor, enabling normalizing flows to achieve SOTA gFID scores of 1.65 and 2.07 on ImageNet 256x256 and 512x512...

  8. MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

    cs.CV 2026-03 unverdicted novelty 6.0

    MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.

  9. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  10. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  11. Scaling Laws for Transfer

    cs.LG 2021-02 unverdicted novelty 6.0

    Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

  12. ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

    cs.CV 2026-04 unverdicted novelty 5.0

    ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.

  13. Bayesian Volumetric Autoregressive generative models for better semisupervised learning

    cs.LG 2019-07 unverdicted novelty 5.0

    Volumetric PixelCNN reformulated as Bayesian deep GP yields uncertainty that improves semi-supervised learning on brain MRI with low label proportions.

  14. To each route its own ETA: A generative modeling framework for ETA prediction

    cs.LG 2019-06 unverdicted novelty 4.0

    A route-specific deep generative model learns the probability distribution of bus trip ETAs from historical data alone and conditions updates on real-time trip progress.