Recognition: unknown
Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM
Pith reviewed 2026-05-09 17:08 UTC · model grok-4.3
The pith
A hypernetwork produces a meta-signal β that gates SwiGLU nonlinearity to let LLMs adapt to arbitrary textual conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By activating the meta-signal of β within the SwiGLU blocks, the model creates a meta-gating mechanism that adaptively adjusts the nonlinearity of the feed-forward network. A hypernetwork is employed which dynamically produces β on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, the method outperforms finetuning and meta-learning baselines and can generalize reasonably on unseen tasks, condition types, or instructions.
What carries the argument
Hypernetwork that dynamically outputs the meta-signal β to gate the nonlinearity inside each SwiGLU block of the FFN.
If this is right
- The model handles textual conditions specifying tasks, domains, personas, and styles without requiring weight updates.
- Performance exceeds both direct finetuning and prior meta-learning baselines across the tested condition types.
- The approach extends reasonably to tasks, condition types, or instructions never seen during training.
- Meta-controllability is added while preserving the original LLM capabilities that would otherwise be lost to catastrophic forgetting.
Where Pith is reading between the lines
- The same gating signal could be applied to multiple model components beyond the FFN to increase the range of controllable behaviors.
- Because conditions are read as plain text, the method could be combined with retrieval or external memory to supply dynamic context at inference time.
- A single base model trained this way might replace families of separately finetuned models for different user groups or domains.
- The hypernetwork itself could be made smaller or shared across several LLMs to further lower the cost of adding new condition types.
Load-bearing premise
Dynamically generating the meta-signal β from arbitrary textual inputs via a hypernetwork produces stable adaptation without training instability or loss of the base model's original capabilities.
What would settle it
If training with the hypernetwork-generated β produces unstable loss curves or if inference on held-out benchmarks shows clear drops relative to the unmodified base model, the claimed controllability would not hold.
Figures
read the original abstract
Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $\beta$ within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $\beta$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes activating a meta-signal β inside SwiGLU blocks to create a meta-gating mechanism that adaptively modulates the nonlinearity of the FFN in LLMs. A hypernetwork dynamically produces β from arbitrary textual conditions (task, domain, persona, style), enabling lightweight adaptation that the authors claim outperforms finetuning and meta-learning baselines while generalizing to unseen conditions; code is released at https://github.com/AaronJi/MeGan.
Significance. If the performance and generalization claims are substantiated, the approach could supply a scalable, parameter-efficient alternative to full finetuning or heavy meta-learning for conditioning LLMs on diverse textual inputs while reducing catastrophic forgetting. The open code release is a concrete strength that supports reproducibility and further investigation.
major comments (2)
- [Abstract] Abstract: the central claim that the method 'outperforms finetuning and meta-learning baselines' and 'can generalize reasonably on unseen tasks, condition types, or instructions' is stated without any quantitative metrics, baseline specifications, statistical significance, ablation studies, or tables. This absence leaves the primary empirical contribution unsupported by visible evidence.
- [Experiments] Experiments section (presumed §4–5): no details are supplied on the concrete performance numbers across condition types, how the hypernetwork-generated β affects training stability or base-model capabilities, compute overhead relative to baselines, or controls for the weakest assumption that the dynamic gating remains stable and effective.
minor comments (1)
- [Abstract] The term 'meta-controllability' is introduced without a precise definition or link to the architectural components; a short clarifying sentence would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable feedback on our manuscript. We address the major comments point by point below, and we will make revisions to improve the clarity and support of our empirical claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the method 'outperforms finetuning and meta-learning baselines' and 'can generalize reasonably on unseen tasks, condition types, or instructions' is stated without any quantitative metrics, baseline specifications, statistical significance, ablation studies, or tables. This absence leaves the primary empirical contribution unsupported by visible evidence.
Authors: We agree with the referee that the abstract would be strengthened by including quantitative support for the central claims. In the revised manuscript, we will update the abstract to include key performance metrics from our experiments, such as relative improvements over baselines, and reference the specific tables and statistical tests used. revision: yes
-
Referee: [Experiments] Experiments section (presumed §4–5): no details are supplied on the concrete performance numbers across condition types, how the hypernetwork-generated β affects training stability or base-model capabilities, compute overhead relative to baselines, or controls for the weakest assumption that the dynamic gating remains stable and effective.
Authors: We will revise the Experiments section to provide more explicit details on the performance numbers for each condition type, an analysis of the effects of the hypernetwork-generated β on training stability and preservation of base-model capabilities, comparisons of compute overhead, and additional experiments or controls to demonstrate the stability and effectiveness of the dynamic gating mechanism. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces an architectural mechanism: a hypernetwork that dynamically generates a meta-signal β to modulate the SwiGLU nonlinearity in FFN blocks, conditioned on arbitrary textual inputs (task, domain, persona, style). This is presented as an empirical engineering addition rather than a mathematical derivation. Performance claims rest on direct testing against finetuning and meta-learning baselines, with reported generalization to unseen conditions; no equations or steps reduce a 'prediction' to a fitted quantity defined in terms of itself, nor do they rely on load-bearing self-citations or imported uniqueness theorems. The construction is self-contained and externally verifiable via the released code.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Llama 3 model card
AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
2024
-
[4]
L., Gao, J., and Choi, Y
Bisk, Y., Zellers, R., Bras, R. L., Gao, J., and Choi, Y. Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2020
-
[6]
Charakorn, R., Cetin, E., Tang, Y., and Lange, R. T. Text-to-lo RA : Instant transformer adaption. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=zWskCdu3QA
2025
-
[7]
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Ch...
2021
-
[12]
Meta-in-context learning in large language models
Coda-Forno, J., Binz, M., Akata, Z., Botvinick, M., Wang, J., and Schulz, E. Meta-in-context learning in large language models. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp.\ 65189--65201. Curran Associates, Inc., 2023
2023
-
[13]
K., Aitchison, M., Orseau, L., Hutter, M., and Veness, J
Del \' e tang, G., Ruoss, A., Duquenne, P., Catt, E., Genewein, T., Mattern, C., Grau - Moya, J., Wenliang, L. K., Aitchison, M., Orseau, L., Hutter, M., and Veness, J. Language modeling is compression. In ICLR , 2024
2024
-
[14]
Model-agnostic meta-learning for fast adaptation of deep networks
Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 1126--1135. PMLR, 06--11 Aug 2017. URL https://proceedings.mlr.press/v70/finn17a.html
2017
-
[15]
Grattafiori, A. et al. The llama 3 herd of models, 2024
2024
-
[16]
M., and Le, Q
Ha, D., Dai, A. M., and Le, Q. V. Hypernetworks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkpACe1lx
2017
-
[17]
Towards a unified view of parameter-efficient transfer learning
He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., and Neubig, G. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=0RDcd5Axok
2022
-
[18]
Learning to optimize resource in dynamic wireless environment via meta-gating graph neural network
Hou, Q., Lee, M., Yu, G., and Cai, Y. Learning to optimize resource in dynamic wireless environment via meta-gating graph neural network. In 2022 International Symposium on Wireless Communication Systems (ISWCS), pp.\ 1--6, 2022 a . doi:10.1109/ISWCS56560.2022.9940416
-
[19]
Meta-gating framework for fast and continuous resource optimization in dynamic wireless environments
Hou, Q., Lee, M., Yu, G., and Cai, Y. Meta-gating framework for fast and continuous resource optimization in dynamic wireless environments. IEEE Transactions on Communications, 71 0 (9): 0 5259--5273, 2023. doi:10.1109/TCOMM.2023.3292257
-
[21]
Parameter-efficient transfer learning for NLP
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. Parameter-efficient transfer learning for NLP . In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.\ 2790--2799. P...
2019
-
[22]
J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W
Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9
2022
-
[25]
B., Chandra, B., and Yejin, C
Keisuke, S., Ronan, L. B., Chandra, B., and Yejin, C. Winogrande: An adversarial winograd schema challenge at scale. 2019
2019
-
[29]
META - LORA : Memory-efficient sample reweighting for fine-tuning large language models
Li, W., Zou, L., Tang, M., Yu, Q., Li, W., and Li, C. META - LORA : Memory-efficient sample reweighting for fine-tuning large language models. In Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B. D., and Schockaert, S. (eds.), Proceedings of the 31st International Conference on Computational Linguistics, pp.\ 8504--8517, Abu Dhabi, UAE, ...
2025
-
[30]
Rouge: A package for automatic evaluation of summaries
Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.\ 74--81, 2004
2004
-
[31]
Metagater: Fast learning of conditional channel gated networks via federated meta-learning
Lin, S., Yang, L., He, Z., Fan, D., and Zhang, J. Metagater: Fast learning of conditional channel gated networks via federated meta-learning. In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp.\ 164--172, 2021. doi:10.1109/MASS52906.2021.00031
-
[32]
S., Wang, Y., and Zhang, L
Liu, J., Xia, C. S., Wang, Y., and Zhang, L. Is your code generated by chat GPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=1qvx610Cu7
2023
-
[33]
Can a suit of armor conduct electricity? a new dataset for open book question answering
Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP, 2018
2018
-
[36]
Bleu: a method for automatic evaluation of machine translation
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.\ 311--318, 2002
2002
-
[37]
In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models
Ponce, D. and Etchegoyhen, T. In-context learning vs. instruction tuning: The case of small and multilingual language models, 2025. URL https://arxiv.org/abs/2503.01611
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free
Qiu, Z., Wang, Z., Zheng, B., Huang, Z., Wen, K., Yang, S., Men, R., Yu, L., Huang, F., Huang, S., Liu, D., Zhou, J., and Lin, J. Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=1b7whO4SfY
2025
-
[40]
Qwen Team, A. G. QWEN2 TECHNICAL REPORT . Technical report, Alibaba Group, 2024
2024
-
[41]
and Tetreault, J
Rao, S. and Tetreault, J. R. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. In North American Chapter of the Association for Computational Linguistics, 2018
2018
-
[43]
Schmidhuber, J. Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In Pezzulo, G., Butz, M. V., Sigaud, O., and Baldassarre, G. (eds.), Anticipatory Behavior in Adaptive Learning Systems, pp.\ 48--76, Berlin, H...
2009
-
[45]
Silver, R. A. Neuronal arithmetic. Nature Reviews Neuroscience, 11: 0 474--489, 2010. URL https://api.semanticscholar.org/CorpusID:205505926
2010
-
[47]
D., Ng, A., and Potts, C
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., and Bethard, S. (eds.), Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.\ 1631--1642, Seattle...
2013
-
[50]
An observation on generalization
Sutskever, I. An observation on generalization. Simons Institute workshop on Large Language Models and Transformers, 2023. URL https://simons.berkeley.edu/talks/ilya-sutskever-openai-2023-08-14
2023
-
[51]
Hydralo RA : An asymmetric lo RA architecture for efficient fine-tuning
Tian, C., Shi, Z., Guo, Z., Li, L., and zhong Xu, C. Hydralo RA : An asymmetric lo RA architecture for efficient fine-tuning. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=qEpi8uWX3N
2024
-
[52]
Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pp.\ 1--5, 2015. doi:10.1109/ITW.2015.7133169
-
[56]
S., Arunkumar, A., Stap, D., et al
Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Naik, A., Ashok, A., Dhanasekaran, A. S., Arunkumar, A., Stap, D., et al. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 5085--5109, 2022 b
2022
-
[57]
V., Zhou, D., et al
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 0 24824--24837, 2022
2022
-
[61]
Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019
Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019
2019
-
[65]
2024 , url =
Llama 3 Model Card , author=. 2024 , url =
2024
-
[66]
2024 , eprint=
The Llama 3 Herd of Models , author=. 2024 , eprint=
2024
-
[67]
QWEN2 TECHNICAL REPORT
Qwen Team, Alibaba Group. QWEN2 TECHNICAL REPORT
-
[68]
Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[69]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , author=. arXiv preprint arXiv:2403.13372 , year=
work page internal anchor Pith review arXiv
-
[70]
Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
-
[71]
Forty-second International Conference on Machine Learning , year=
Metadata Conditioning Accelerates Language Model Pre-training , author=. Forty-second International Conference on Machine Learning , year=
-
[72]
International Conference on Learning Representations , year=
HyperNetworks , author=. International Conference on Learning Representations , year=
-
[73]
Alex Reinhart and Ben Markey and Michael Laudenbach and Kachatad Pantusen and Ronald Yurko and Gordon Weinberg and David West Brown , title =. Proceedings of the National Academy of Sciences , volume =. 2025 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2422455122 , abstract =
-
[74]
Comparing LLM -generated and human-authored news text using formal syntactic theory
Zamaraeva, Olga and Flickinger, Dan and Bond, Francis and G \'o mez-Rodr \'i guez, Carlos. Comparing LLM -generated and human-authored news text using formal syntactic theory. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.443
-
[75]
Substance over Style: Evaluating Proactive Conversational Coaching Agents
Srinivas, Vidya and Xu, Xuhai and Liu, Xin and Ayush, Kumar and Galatzer-Levy, Isaac and Patel, Shwetak and McDuff, Daniel and Althoff, Tim. Substance over Style: Evaluating Proactive Conversational Coaching Agents. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.a...
-
[76]
2006 , journal =
A framework for authorship identification of online messages: Writing‐style features and classification techniques , author =. 2006 , journal =
2006
-
[77]
Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation
Li, Jinpeng and Zhang, Zekai and Chen, Xiuying and Zhao, Dongyan and Yan, Rui. Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.475
-
[78]
Stylistic Response Generation by Controlling Personality Traits and Intent
Saha, Sougata and Das, Souvik and Srihari, Rohini. Stylistic Response Generation by Controlling Personality Traits and Intent. Proceedings of the 4th Workshop on NLP for Conversational AI. 2022. doi:10.18653/v1/2022.nlp4convai-1.16
-
[79]
S tyle DGPT : Stylized Response Generation with Pre-trained Language Models
Yang, Ze and Wu, Wei and Xu, Can and Liang, Xinnian and Bai, Jiaqi and Wang, Liran and Wang, Wei and Li, Zhoujun. S tyle DGPT : Stylized Response Generation with Pre-trained Language Models. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.140
-
[80]
COLING , pages=
Paraphrasing for Style , author=. COLING , pages=
-
[81]
North American Chapter of the Association for Computational Linguistics , year=
Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , author=. North American Chapter of the Association for Computational Linguistics , year=
-
[82]
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
Ziems, Caleb and Yu, Jane and Wang, Yi-Chia and Halevy, Alon and Yang, Diyi. The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.261
-
[83]
Towards emotional support dialog systems
Liu, Siyang and Zheng, Chujie and Demasi, Orianna and Sabour, Sahand and Li, Yu and Yu, Zhou and Jiang, Yong and Huang, Minlie. Towards Emotional Support Dialog Systems. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)....
-
[84]
D aily D ialog: A Manually Labelled Multi-turn Dialogue Dataset
Li, Yanran and Su, Hui and Shen, Xiaoyu and Li, Wenjie and Cao, Ziqiang and Niu, Shuzi. D aily D ialog: A Manually Labelled Multi-turn Dialogue Dataset. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017
2017
-
[85]
, author=
Emotion Detection on TV Show Transcripts with Sequence-Based Convolutional Neural Networks. , author=. AAAI Workshops , volume=
-
[86]
Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , url =
Rashkin, Hannah and Smith, Eric Michael and Li, Margaret and Boureau, Y-Lan. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1534
-
[87]
and Ng, Andrew and Potts, Christopher
Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D. and Ng, Andrew and Potts, Christopher. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013
2013
-
[88]
and Daly, Raymond E
Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher. Learning Word Vectors for Sentiment Analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011
2011
-
[89]
Journal of Big Data , year=
Sentiment analysis using product review data , author=. Journal of Big Data , year=
-
[90]
AAAI Conference on Artificial Intelligence , year=
Adaptive Prompt Routing for Arbitrary Text Style Transfer with Pre-trained Language Models , author=. AAAI Conference on Artificial Intelligence , year=
-
[91]
Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
-
[92]
Text summarization branches out , pages=
Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=
-
[93]
A Diversity-Promoting Objective Function for Neural Conversation Models
A diversity-promoting objective function for neural conversation models , author=. arXiv preprint arXiv:1510.03055 , year=
-
[94]
Aston-Jones, Gary and Cohen, Jonathan D. AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. Annual Review of Neuroscience. 2005. doi:https://doi.org/10.1146/annurev.neuro.28.061604.135709
-
[95]
David Servan-Schreiber and Harry Printz and Jonathan D. Cohen , title =. Science , volume =. 1990 , doi =. https://www.science.org/doi/pdf/10.1126/science.2392679 , abstract =
-
[96]
Nature Reviews Neuroscience , year=
Neuronal arithmetic , author=. Nature Reviews Neuroscience , year=
-
[97]
MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning , year=
Lin, Sen and Yang, Li and He, Zhezhi and Fan, Deliang and Zhang, Junshan , booktitle=. MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning , year=
-
[98]
Learning to Optimize Resource in Dynamic Wireless Environment via Meta-Gating Graph Neural Network , year=
Hou, Qiushuo and Lee, Mengyuan and Yu, Guanding and Cai, Yunlong , booktitle=. Learning to Optimize Resource in Dynamic Wireless Environment via Meta-Gating Graph Neural Network , year=
-
[99]
Meta-Gating Framework for Fast and Continuous Resource Optimization in Dynamic Wireless Environments , year=
Hou, Qiushuo and Lee, Mengyuan and Yu, Guanding and Cai, Yunlong , journal=. Meta-Gating Framework for Fast and Continuous Resource Optimization in Dynamic Wireless Environments , year=
-
[100]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[101]
C ross F it: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Ye, Qinyuan and Lin, Bill Yuchen and Ren, Xiang. C ross F it: A Few-shot Learning Challenge for Cross-task Generalization in NLP. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.572
-
[102]
UNIFIEDQA : Crossing Format Boundaries with a Single QA System
Khashabi, Daniel and Min, Sewon and Khot, Tushar and Sabharwal, Ashish and Tafjord, Oyvind and Clark, Peter and Hajishirzi, Hannaneh. UNIFIEDQA : Crossing Format Boundaries with a Single QA System. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.171
-
[103]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
2022
-
[104]
Super- N atural I nstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Naik, Atharva and Ashok, Arjun and Dhanasekaran, Arut Selvan and Arunkumar, Anjana and Stap, David and Pathak, Eshaan and Karamanolakis, Giannis and Lai, Haizhi and Purohit, Ishan and Mondal, Ishani and Anderson, Jacob and Kuznia, Kirby and Doshi, Kr...
-
[105]
Zhang, Saizheng and Dinan, Emily and Urbanek, Jack and Szlam, Arthur and Kiela, Douwe and Weston, Jason. Personalizing Dialogue Agents: I have a dog, do you have pets too?. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1205
-
[106]
A dapt S um: Towards Low-Resource Domain Adaptation for Abstractive Summarization
Yu, Tiezheng and Liu, Zihan and Fung, Pascale. A dapt S um: Towards Low-Resource Domain Adaptation for Abstractive Summarization. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.471
-
[107]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord , title =. arXiv:1803.05457v1 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[108]
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
BoolQ: Exploring the surprising difficulty of natural yes/no questions , author=. arXiv preprint arXiv:1905.10044 , year=
work page internal anchor Pith review arXiv 1905
-
[109]
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[110]
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year=
HellaSwag: Can a Machine Really Finish Your Sentence? , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.