SleepNet and DreamNet: Enriching and Reconstructing Representations for Consolidated Visual Classification

Mingze Ni; Wei Liu

arxiv: 2409.01633 · v4 · submitted 2024-09-03 · 💻 cs.LG · cs.AI· cs.CV

SleepNet and DreamNet: Enriching and Reconstructing Representations for Consolidated Visual Classification

Mingze Ni , Wei Liu This is my paper

Pith reviewed 2026-05-23 20:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords SleepNetDreamNetfeature enrichmentrepresentation reconstructionvisual classificationpre-trained encodersencoder-decoder modelsdeep learning

0 comments

The pith

SleepNet and DreamNet improve visual classification by enriching pre-trained encoder outputs and reconstructing hidden states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SleepNet to combine supervised learning directly with features from pre-trained encoders, producing stronger representations for classification. DreamNet builds on this by adding pre-trained encoder-decoder pairs that reconstruct hidden states, which the authors say allows deeper consolidation and refinement of those representations. Experiments reported in the paper show both models outperforming existing state-of-the-art methods on visual tasks. A sympathetic reader would see this as a practical way to leverage existing pre-trained models without discarding their outputs. The central mechanism is the enrichment step in SleepNet plus the reconstruction step in DreamNet.

Core claim

SleepNet integrates supervised learning with representations obtained from pre-trained encoders to achieve stronger and more robust feature learning. DreamNet extends the approach by incorporating pre-trained encoder-decoder frameworks to reconstruct hidden states, enabling deeper consolidation and refinement of visual representations. The authors claim these enrichment and reconstruction strategies produce consistently superior performance compared with existing state-of-the-art methods.

What carries the argument

SleepNet's supervised integration of pre-trained encoder outputs with DreamNet's encoder-decoder reconstruction of hidden states, which together consolidate visual representations for classification.

If this is right

Enriching pre-trained representations through supervised learning strengthens feature robustness for classification.
Reconstructing hidden states via encoder-decoder pairs allows deeper refinement of visual features.
The two models together outperform existing state-of-the-art methods on visual tasks.
Feature enrichment and reconstruction strategies improve overall representation utilization in deep networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on tasks beyond classification, such as detection or segmentation, to check if reconstruction helps there too.
If the gains hold on larger or more varied datasets, the approach might reduce reliance on training entirely new models from scratch.
The reconstruction step might interact differently with various pre-trained backbones, which would be worth checking in follow-up experiments.

Load-bearing premise

The specific combination of supervised fine-tuning on pre-trained encoder outputs plus hidden-state reconstruction via encoder-decoder pairs will produce robust gains without introducing overfitting.

What would settle it

Training SleepNet and DreamNet on standard visual classification benchmarks and finding they fail to exceed current state-of-the-art accuracy would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2409.01633 by Mingze Ni, Wei Liu.

**Figure 2.** Figure 2: Overview of the Visual SleepNet Architecture, featuring M “Sleep Blocks" that are constructed by chain-like [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the Textual SleepNet Architecture, featuring M “Sleep Blocks" that are constructed by chain-like [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the Visual DreamNet Architecture, integrating “Dream Blocks" that include chain-like blocks [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Overview of the Textual DreamNet Architecture, integrating “Dream Blocks" that include by chain-like [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Stages of image transformation by DreamNet-3 using a masked autoencoder (MAE) [ [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation studies for testing the different chain-like blocks by comparing the original performance of various [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Ablation studies for testing the different pre-trained encoders/autoencoders by comparing the original [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: This plot delineates the impact of freezing (marked in blue) and unfreezing (indicated in orange) the parameters [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

An effective integration of rich feature representations with robust classification mechanisms remains a key challenge in visual understanding tasks. This study introduces two novel deep learning models, SleepNet and DreamNet, which are designed to improve representation utilization through feature enrichment and reconstruction strategies. SleepNet integrates supervised learning with representations obtained from pre-trained encoders, leading to stronger and more robust feature learning. Building on this foundation, DreamNet incorporates pre-trained encoder decoder frameworks to reconstruct hidden states, allowing deeper consolidation and refinement of visual representations. Our experiments show that our models consistently achieve superior performance compared with existing state-of-the-art methods, demonstrating the effectiveness of the proposed enrichment and reconstruction approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces SleepNet, which performs supervised fine-tuning on outputs from pre-trained encoders to enrich feature representations for visual classification, and DreamNet, which adds an encoder-decoder framework to reconstruct hidden states for further consolidation and refinement. The central claim is that these enrichment and reconstruction strategies yield models that consistently outperform existing state-of-the-art methods on visual understanding tasks.

Significance. If the performance gains are shown to be robust, attributable to the proposed mechanisms rather than confounding factors, and supported by proper controls, the work could provide a practical approach to improving representation utilization in vision models through supervised enrichment and hidden-state reconstruction.

major comments (2)

[Abstract] Abstract: the assertion that 'our models consistently achieve superior performance compared with existing state-of-the-art methods' is presented without any datasets, baselines, quantitative metrics, deltas, error bars, or ablation results, so the central empirical claim cannot be evaluated from the provided text.
[Abstract] The weakest assumption—that the specific combination of supervised fine-tuning on pre-trained encoder outputs plus hidden-state reconstruction produces robust gains without overfitting or unreported hyperparameter tuning—is unsupported, as no training protocols, regularization details, or ablation studies isolating the reconstruction step are described.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. The full manuscript provides extensive experimental validation, but we agree the abstract can be strengthened for clarity and have revised it accordingly while preserving its concise nature.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'our models consistently achieve superior performance compared with existing state-of-the-art methods' is presented without any datasets, baselines, quantitative metrics, deltas, error bars, or ablation results, so the central empirical claim cannot be evaluated from the provided text.

Authors: The abstract is intentionally high-level to summarize the contribution within length limits. The full paper (Sections 4-5) reports results on standard datasets such as ImageNet and CIFAR-100, with comparisons to multiple SOTA baselines, quantitative metrics (accuracy, top-1/top-5), deltas, error bars from multiple runs, and ablation studies. To address the concern, we have revised the abstract to include one key quantitative highlight and reference to the evaluation protocol. revision: yes
Referee: [Abstract] The weakest assumption—that the specific combination of supervised fine-tuning on pre-trained encoder outputs plus hidden-state reconstruction produces robust gains without overfitting or unreported hyperparameter tuning—is unsupported, as no training protocols, regularization details, or ablation studies isolating the reconstruction step are described.

Authors: All training protocols, hyperparameter choices, regularization (e.g., dropout, weight decay), and ablation studies isolating the reconstruction component of DreamNet are detailed in the Methods and Experiments sections. These controls demonstrate that gains are attributable to the proposed mechanisms rather than overfitting. The abstract summarizes rather than replicates these details; we have added a brief clause noting that robustness is verified via ablations in the revised version. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on experiments, not on any self-referential derivation or fitted input renamed as prediction.

full rationale

The paper introduces SleepNet and DreamNet as architectural combinations of supervised fine-tuning on pre-trained encoders plus encoder-decoder reconstruction of hidden states. These are presented as design choices whose value is asserted via experimental comparison to SOTA baselines. No equations, uniqueness theorems, ansatzes, or parameter-fitting steps appear in the provided text that would reduce a claimed result to its own inputs by construction. The central claim is therefore an empirical assertion whose validity depends on unreported experimental details rather than on any definitional or self-citation loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated in the abstract; the claim rests entirely on an unreported empirical comparison.

pith-pipeline@v0.9.0 · 5635 in / 1032 out tokens · 24604 ms · 2026-05-23T20:59:21.408483+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 11 internal anchors

[1]

The sleep-immune crosstalk in health and disease

Luciana Besedovsky, Tanja Lange, and Monika Haack. The sleep-immune crosstalk in health and disease. Physiological reviews, 2019

work page 2019
[2]

Beyond dreams: do sleep-related movements contribute to brain development? Frontiers in Neurology, 1:140, 2010

Mark S Blumberg. Beyond dreams: do sleep-related movements contribute to brain development? Frontiers in Neurology, 1:140, 2010

work page 2010
[3]

Function of dreams

Louis Breger. Function of dreams. Journal of Abnormal Psychology, 72(5p2):1, 1967

work page 1967
[4]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page internal anchor Pith review Pith/arXiv arXiv 2005
[5]

An empirical survey of data augmentation for limited data learning in nlp

Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, and Diyi Yang. An empirical survey of data augmentation for limited data learning in nlp. Transactions of the Association for Computational Linguistics , 11:191–211, 2023

work page 2023
[6]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020

work page 2020
[7]

Coatnet: Marrying convolution and attention for all data sizes

Zihang Dai, Hanxiao Liu, Quoc V Le, and Mingxing Tan. Coatnet: Marrying convolution and attention for all data sizes. Advances in neural information processing systems , 34:3965–3977, 2021

work page 2021
[8]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019
[9]

A new neurocognitive theory of dreams

G William Domhoff. A new neurocognitive theory of dreams. Dreaming, 11:13–33, 2001

work page 2001
[11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[12]

The neuroprotective aspects of sleep

Andy R Eugene and Jolanta Masiak. The neuroprotective aspects of sleep. MEDtube science, 3(1):35, 2015

work page 2015
[13]

The role of sleep in emotional brain function

Andrea N Goldstein and Matthew P Walker. The role of sleep in emotional brain function. Annual review of clinical psychology, 10:679–708, 2014

work page 2014
[14]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014

work page 2014
[15]

Girshick

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll’ar, and Ross B. Girshick. Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 15979–15988, 2021

work page 2022
[16]

Zhang, Shaoqing Ren, and Jian Sun

Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 770–778, 2015

work page 2016
[17]

Masking augmentation for supervised learning

Byeongho Heo, Taekyung Kim, Sangdoo Yun, and Dongyoon Han. Masking augmentation for supervised learning. arXiv preprint arXiv:2306.11339, 2023

work page arXiv 2023
[18]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv, abs/1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Trans-blstm: Transformer with bidirectional lstm for language understanding

Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, and Bing Xiang. Trans-blstm: Transformer with bidirectional lstm for language understanding. arXiv preprint arXiv:2003.07000, 2020

work page arXiv 2003
[20]

Ilyas, S

Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. ArXiv, abs/1905.02175, 2019

work page arXiv 1905
[21]

John M. Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Zídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Andy Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David A. Reiman,...

work page 2021
[22]

Convolutional neural networks for sentence classification

Yoon Kim. Convolutional neural networks for sentence classification. In Conference on Empirical Methods in Natural Language Processing, 2014

work page 2014
[23]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[24]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[25]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012. 17 Running Title for Header

work page 2012
[27]

Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015

work page 2015
[28]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettle- moyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[29]

Maas, Raymond E

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y . Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Com- putational Linguistics: Human Language Technologies , pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics

work page 2011
[30]

Corrado, and Jeffrey Dean

Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations , 2013

work page 2013
[31]

Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp

John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages 119–126, 2020

work page 2020
[32]

Giant panda eating, 2024

National Geographic. Giant panda eating, 2024. Accessed: 2024-06-04

work page 2024
[33]

Deep contextualized word representations

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. ArXiv, abs/1802.05365, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

About sleep’s role in memory

Björn Rasch and Jan Born. About sleep’s role in memory. Physiological reviews, 93 2:681–766, 2013

work page 2013
[35]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) , 115(3):211–252, 2015

work page 2015
[36]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[37]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[38]

Reed, Dragomir Anguelov, D

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, D. Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2014

work page 2015
[39]

Efficientnet: Rethinking model scaling for convolutional neural networks

Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning , pages 6105–6114. PMLR, 2019

work page 2019
[40]

Augmenting convolutional networks with attention-based aggregation

Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, and Hervé Jégou. Augmenting convolutional networks with attention-based aggregation. arXiv preprint arXiv:2112.13692, 2021

work page arXiv 2021
[41]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems , 30, 2017

work page 2017
[42]

Cvt: Introducing convolutions to vision transformers

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision , pages 22–31, 2021

work page 2021
[43]

Xlnet: Generalized autoregressive pretraining for language understanding

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019

work page 2019
[44]

Understanding how pretraining regularizes deep learning algorithms

Yu Yao, Baosheng Yu, Chen Gong, and Tongliang Liu. Understanding how pretraining regularizes deep learning algorithms. IEEE Transactions on Neural Networks and Learning Systems , 2021

work page 2021
[45]

Scaling vision transformers

Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, and Lucas Beyer. Scaling vision transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 12104–12113, 2022

work page 2022
[46]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

Character-level convolutional networks for text classification

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS, 2015

work page 2015
[48]

Character-level convolutional networks for text classification

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. 18 Running Title for Header Biographies Mingze Ni is a Postdoctoral Researcher in machine learning at the University of Tech- nology Sydney. He graduated from the Australian National University with bachelor’s degrees in science (St...

work page 2015

[1] [1]

The sleep-immune crosstalk in health and disease

Luciana Besedovsky, Tanja Lange, and Monika Haack. The sleep-immune crosstalk in health and disease. Physiological reviews, 2019

work page 2019

[2] [2]

Beyond dreams: do sleep-related movements contribute to brain development? Frontiers in Neurology, 1:140, 2010

Mark S Blumberg. Beyond dreams: do sleep-related movements contribute to brain development? Frontiers in Neurology, 1:140, 2010

work page 2010

[3] [3]

Function of dreams

Louis Breger. Function of dreams. Journal of Abnormal Psychology, 72(5p2):1, 1967

work page 1967

[4] [4]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page internal anchor Pith review Pith/arXiv arXiv 2005

[5] [5]

An empirical survey of data augmentation for limited data learning in nlp

Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, and Diyi Yang. An empirical survey of data augmentation for limited data learning in nlp. Transactions of the Association for Computational Linguistics , 11:191–211, 2023

work page 2023

[6] [6]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020

work page 2020

[7] [7]

Coatnet: Marrying convolution and attention for all data sizes

Zihang Dai, Hanxiao Liu, Quoc V Le, and Mingxing Tan. Coatnet: Marrying convolution and attention for all data sizes. Advances in neural information processing systems , 34:3965–3977, 2021

work page 2021

[8] [8]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019

[9] [9]

A new neurocognitive theory of dreams

G William Domhoff. A new neurocognitive theory of dreams. Dreaming, 11:13–33, 2001

work page 2001

[10] [11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[11] [12]

The neuroprotective aspects of sleep

Andy R Eugene and Jolanta Masiak. The neuroprotective aspects of sleep. MEDtube science, 3(1):35, 2015

work page 2015

[12] [13]

The role of sleep in emotional brain function

Andrea N Goldstein and Matthew P Walker. The role of sleep in emotional brain function. Annual review of clinical psychology, 10:679–708, 2014

work page 2014

[13] [14]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014

work page 2014

[14] [15]

Girshick

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll’ar, and Ross B. Girshick. Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 15979–15988, 2021

work page 2022

[15] [16]

Zhang, Shaoqing Ren, and Jian Sun

Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 770–778, 2015

work page 2016

[16] [17]

Masking augmentation for supervised learning

Byeongho Heo, Taekyung Kim, Sangdoo Yun, and Dongyoon Han. Masking augmentation for supervised learning. arXiv preprint arXiv:2306.11339, 2023

work page arXiv 2023

[17] [18]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv, abs/1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [19]

Trans-blstm: Transformer with bidirectional lstm for language understanding

Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, and Bing Xiang. Trans-blstm: Transformer with bidirectional lstm for language understanding. arXiv preprint arXiv:2003.07000, 2020

work page arXiv 2003

[19] [20]

Ilyas, S

Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. ArXiv, abs/1905.02175, 2019

work page arXiv 1905

[20] [21]

John M. Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Zídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Andy Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David A. Reiman,...

work page 2021

[21] [22]

Convolutional neural networks for sentence classification

Yoon Kim. Convolutional neural networks for sentence classification. In Conference on Empirical Methods in Natural Language Processing, 2014

work page 2014

[22] [23]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[23] [24]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[24] [25]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[25] [26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012. 17 Running Title for Header

work page 2012

[26] [27]

Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015

work page 2015

[27] [28]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettle- moyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[28] [29]

Maas, Raymond E

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y . Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Com- putational Linguistics: Human Language Technologies , pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics

work page 2011

[29] [30]

Corrado, and Jeffrey Dean

Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations , 2013

work page 2013

[30] [31]

Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp

John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages 119–126, 2020

work page 2020

[31] [32]

Giant panda eating, 2024

National Geographic. Giant panda eating, 2024. Accessed: 2024-06-04

work page 2024

[32] [33]

Deep contextualized word representations

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. ArXiv, abs/1802.05365, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[33] [34]

About sleep’s role in memory

Björn Rasch and Jan Born. About sleep’s role in memory. Physiological reviews, 93 2:681–766, 2013

work page 2013

[34] [35]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) , 115(3):211–252, 2015

work page 2015

[35] [36]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[36] [37]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[37] [38]

Reed, Dragomir Anguelov, D

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, D. Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2014

work page 2015

[38] [39]

Efficientnet: Rethinking model scaling for convolutional neural networks

Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning , pages 6105–6114. PMLR, 2019

work page 2019

[39] [40]

Augmenting convolutional networks with attention-based aggregation

Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, and Hervé Jégou. Augmenting convolutional networks with attention-based aggregation. arXiv preprint arXiv:2112.13692, 2021

work page arXiv 2021

[40] [41]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems , 30, 2017

work page 2017

[41] [42]

Cvt: Introducing convolutions to vision transformers

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision , pages 22–31, 2021

work page 2021

[42] [43]

Xlnet: Generalized autoregressive pretraining for language understanding

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019

work page 2019

[43] [44]

Understanding how pretraining regularizes deep learning algorithms

Yu Yao, Baosheng Yu, Chen Gong, and Tongliang Liu. Understanding how pretraining regularizes deep learning algorithms. IEEE Transactions on Neural Networks and Learning Systems , 2021

work page 2021

[44] [45]

Scaling vision transformers

Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, and Lucas Beyer. Scaling vision transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 12104–12113, 2022

work page 2022

[45] [46]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[46] [47]

Character-level convolutional networks for text classification

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS, 2015

work page 2015

[47] [48]

Character-level convolutional networks for text classification

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. 18 Running Title for Header Biographies Mingze Ni is a Postdoctoral Researcher in machine learning at the University of Tech- nology Sydney. He graduated from the Australian National University with bachelor’s degrees in science (St...

work page 2015