arxiv: 2605.01973 · v1 · submitted 2026-05-03 · 💻 cs.CL · cs.LG

Recognition: unknown

Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

Luo Ji , Qi Qin , Ningyuan Xi , Teng Chen , Qingqing Gu , Hongyan Li

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:08 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords hypernetworkmeta-gatingSwiGLUtextual conditioningLLM adaptationmeta-learningcontrollable generation

0 comments

The pith

A hypernetwork produces a meta-signal β that gates SwiGLU nonlinearity to let LLMs adapt to arbitrary textual conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes activating a controllable meta-signal β inside the SwiGLU blocks of an LLM's feed-forward network, which then modulates the strength of the nonlinearity applied during computation. A separate hypernetwork reads any textual condition and outputs the appropriate β value on the fly, giving the base model a form of meta-controllability without weight updates. This design is tested on conditions that specify tasks, domains, personas, and styles, where it beats both standard finetuning and existing meta-learning methods while showing reasonable success on conditions never encountered in training. The approach directly tackles corpus heterogeneity and the risk of catastrophic forgetting that comes with repeated finetuning. If the mechanism works as described, it supplies a lightweight route for steering large models toward new behaviors using only natural-language instructions.

Core claim

By activating the meta-signal of β within the SwiGLU blocks, the model creates a meta-gating mechanism that adaptively adjusts the nonlinearity of the feed-forward network. A hypernetwork is employed which dynamically produces β on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, the method outperforms finetuning and meta-learning baselines and can generalize reasonably on unseen tasks, condition types, or instructions.

What carries the argument

Hypernetwork that dynamically outputs the meta-signal β to gate the nonlinearity inside each SwiGLU block of the FFN.

If this is right

The model handles textual conditions specifying tasks, domains, personas, and styles without requiring weight updates.
Performance exceeds both direct finetuning and prior meta-learning baselines across the tested condition types.
The approach extends reasonably to tasks, condition types, or instructions never seen during training.
Meta-controllability is added while preserving the original LLM capabilities that would otherwise be lost to catastrophic forgetting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gating signal could be applied to multiple model components beyond the FFN to increase the range of controllable behaviors.
Because conditions are read as plain text, the method could be combined with retrieval or external memory to supply dynamic context at inference time.
A single base model trained this way might replace families of separately finetuned models for different user groups or domains.
The hypernetwork itself could be made smaller or shared across several LLMs to further lower the cost of adding new condition types.

Load-bearing premise

Dynamically generating the meta-signal β from arbitrary textual inputs via a hypernetwork produces stable adaptation without training instability or loss of the base model's original capabilities.

What would settle it

If training with the hypernetwork-generated β produces unstable loss curves or if inference on held-out benchmarks shows clear drops relative to the unmodified base model, the claimed controllability would not hold.

Figures

Figures reproduced from arXiv: 2605.01973 by Hongyan Li, Luo Ji, Ningyuan Xi, Qingqing Gu, Qi Qin, Teng Chen.

**Figure 1.** Figure 1: Paradigm of MeGan analogous to neuro-system. Nueromodulator meta-controls classical neurotransmitters, shaping the synaptic plasticity to meta-plasticity. Similarly, we implement a hypernetwork that converts textual condition inputs into meta signals. These signals are combined with layer index embedding, then meta-control the gating of LLM FFNs. or expert-level emotional supporters (Srinivas et al., 2025… view at source ↗

**Figure 2.** Figure 2: Comparison of meta-learning paradigms on LLMs. synaptic plasticity based on classical neurotransmitters; and neuromodulator produces the adaptive gain control, through several mechanisms including gating, guidance, and stabilization (Aston-Jones & Cohen, 2005). The system then achieves meta-plasticity, which seamlessly adapts to changing environmental contexts without the metabolic cost of physical rewir… view at source ↗

**Figure 3.** Figure 3: Framework of MeGan. (x, y, z) indicate the input, output, and condition, respectively. k ∈ [1, K] denotes the layer index. formulation, z can semantically bridge the meta information between training and target tasks view at source ↗

**Figure 4.** Figure 4: Distribution of β with respect to textual conditions. marks, indicating that the meta-gating mechanism helps the adaptation to these conditions. Zero-shot test on styles and sentiments. Appendix B.1 shows detailed zero-shot results on GYAFC, MIC, and SST. Appendix B.2 further exhibit the corresponding good cases of MeGan as well as on typical open-ended cases. All these promising results verify that MeGan … view at source ↗

**Figure 6.** Figure 6: compares MeGan to LoRA and SFT on PersonaChat and GYAFC, with respect to different model sizes. As model size decreases, MeGan’s performance degrades less than the other two methods, indicating that it is less affected by the foundation model’s capabilities. Note that this phenomenon is more pronounced in the zero-shot case, highlighting the effectiveness of MeGan for meta-learning. 8. Conclusion In this … view at source ↗

**Figure 7.** Figure 7: Averaged β with respect to different layers. β = 0 means SiLu. As β increases, the nonlinearity becomes stronger. B.4. Per-Style Result To rule out the possibility that MeGan generates imbalance response with respect to different style categories, for each stylized domain, we evaluate the metrics with respect to each style view at source ↗

**Figure 8.** Figure 8: t-SNE analysis of averaged β on GYAFC (first 16 layers). 24 view at source ↗

**Figure 9.** Figure 9: t-SNE analysis on averaged β on GYAFC (continued). 25 view at source ↗

read the original abstract

Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $\beta$ within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $\beta$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses a hypernetwork to generate a beta that gates SwiGLU nonlinearity for arbitrary text conditions, a lightweight adaptation trick that avoids full finetuning.

read the letter

The main thing to know is that the authors use a hypernetwork to produce a beta parameter that adjusts the SwiGLU activation inside the LLM's feed-forward layers based on arbitrary text conditions. This is a practical way to add controllability without retraining the entire model or using complex meta-learning setups. The method targets conditions such as task, domain, persona, and style, and the paper claims it outperforms standard baselines while generalizing to new ones. Releasing the code is a solid move that lets others reproduce and build on the work. What the paper does well is keep the core LLM unchanged and focus on a lightweight addition through the hypernetwork. This avoids the forgetting issues common in finetuning and scales better than some meta-learning alternatives for LLMs. The main soft spot is the lack of detailed quantitative results in the abstract, including specific performance numbers, baseline comparisons, and ablation studies. Without those, it's hard to gauge how significant the improvements are or if the method introduces any hidden costs in training stability or inference time. The full paper presumably has these, but they need to be prominent. This work is for people interested in making LLMs more flexible for varied inputs without heavy computational overhead. A reader looking for new conditioning techniques in large language models would get value from the architectural idea and the implementation details. It deserves a serious referee because the core idea is straightforward and the code is available for inspection. I would recommend sending it to peer review with a request for clearer experimental evidence.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes activating a meta-signal β inside SwiGLU blocks to create a meta-gating mechanism that adaptively modulates the nonlinearity of the FFN in LLMs. A hypernetwork dynamically produces β from arbitrary textual conditions (task, domain, persona, style), enabling lightweight adaptation that the authors claim outperforms finetuning and meta-learning baselines while generalizing to unseen conditions; code is released at https://github.com/AaronJi/MeGan.

Significance. If the performance and generalization claims are substantiated, the approach could supply a scalable, parameter-efficient alternative to full finetuning or heavy meta-learning for conditioning LLMs on diverse textual inputs while reducing catastrophic forgetting. The open code release is a concrete strength that supports reproducibility and further investigation.

major comments (2)

[Abstract] Abstract: the central claim that the method 'outperforms finetuning and meta-learning baselines' and 'can generalize reasonably on unseen tasks, condition types, or instructions' is stated without any quantitative metrics, baseline specifications, statistical significance, ablation studies, or tables. This absence leaves the primary empirical contribution unsupported by visible evidence.
[Experiments] Experiments section (presumed §4–5): no details are supplied on the concrete performance numbers across condition types, how the hypernetwork-generated β affects training stability or base-model capabilities, compute overhead relative to baselines, or controls for the weakest assumption that the dynamic gating remains stable and effective.

minor comments (1)

[Abstract] The term 'meta-controllability' is introduced without a precise definition or link to the architectural components; a short clarifying sentence would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We address the major comments point by point below, and we will make revisions to improve the clarity and support of our empirical claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'outperforms finetuning and meta-learning baselines' and 'can generalize reasonably on unseen tasks, condition types, or instructions' is stated without any quantitative metrics, baseline specifications, statistical significance, ablation studies, or tables. This absence leaves the primary empirical contribution unsupported by visible evidence.

Authors: We agree with the referee that the abstract would be strengthened by including quantitative support for the central claims. In the revised manuscript, we will update the abstract to include key performance metrics from our experiments, such as relative improvements over baselines, and reference the specific tables and statistical tests used. revision: yes
Referee: [Experiments] Experiments section (presumed §4–5): no details are supplied on the concrete performance numbers across condition types, how the hypernetwork-generated β affects training stability or base-model capabilities, compute overhead relative to baselines, or controls for the weakest assumption that the dynamic gating remains stable and effective.

Authors: We will revise the Experiments section to provide more explicit details on the performance numbers for each condition type, an analysis of the effects of the hypernetwork-generated β on training stability and preservation of base-model capabilities, comparisons of compute overhead, and additional experiments or controls to demonstrate the stability and effectiveness of the dynamic gating mechanism. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces an architectural mechanism: a hypernetwork that dynamically generates a meta-signal β to modulate the SwiGLU nonlinearity in FFN blocks, conditioned on arbitrary textual inputs (task, domain, persona, style). This is presented as an empirical engineering addition rather than a mathematical derivation. Performance claims rest on direct testing against finetuning and meta-learning baselines, with reported generalization to unseen conditions; no equations or steps reduce a 'prediction' to a fitted quantity defined in terms of itself, nor do they rely on load-bearing self-citations or imported uniqueness theorems. The construction is self-contained and externally verifiable via the released code.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, mathematical axioms, or new postulated entities beyond the high-level description of the meta-gating and hypernetwork; all such details remain unspecified.

pith-pipeline@v0.9.0 · 5448 in / 1134 out tokens · 38636 ms · 2026-05-09T17:08:01.949800+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

119 extracted references · 46 canonical work pages · 7 internal anchors

[1]

Llama 3 model card

AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md

2024
[4]

L., Gao, J., and Choi, Y

Bisk, Y., Zellers, R., Bras, R. L., Gao, J., and Choi, Y. Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2020
[6]

Charakorn, R., Cetin, E., Tang, Y., and Lange, R. T. Text-to-lo RA : Instant transformer adaption. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=zWskCdu3QA

2025
[7]

Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Ch...

2021
[12]

Meta-in-context learning in large language models

Coda-Forno, J., Binz, M., Akata, Z., Botvinick, M., Wang, J., and Schulz, E. Meta-in-context learning in large language models. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp.\ 65189--65201. Curran Associates, Inc., 2023

2023
[13]

K., Aitchison, M., Orseau, L., Hutter, M., and Veness, J

Del \' e tang, G., Ruoss, A., Duquenne, P., Catt, E., Genewein, T., Mattern, C., Grau - Moya, J., Wenliang, L. K., Aitchison, M., Orseau, L., Hutter, M., and Veness, J. Language modeling is compression. In ICLR , 2024

2024
[14]

Model-agnostic meta-learning for fast adaptation of deep networks

Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 1126--1135. PMLR, 06--11 Aug 2017. URL https://proceedings.mlr.press/v70/finn17a.html

2017
[15]

Grattafiori, A. et al. The llama 3 herd of models, 2024

2024
[16]

M., and Le, Q

Ha, D., Dai, A. M., and Le, Q. V. Hypernetworks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkpACe1lx

2017
[17]

Towards a unified view of parameter-efficient transfer learning

He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., and Neubig, G. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=0RDcd5Axok

2022
[18]

Learning to optimize resource in dynamic wireless environment via meta-gating graph neural network

Hou, Q., Lee, M., Yu, G., and Cai, Y. Learning to optimize resource in dynamic wireless environment via meta-gating graph neural network. In 2022 International Symposium on Wireless Communication Systems (ISWCS), pp.\ 1--6, 2022 a . doi:10.1109/ISWCS56560.2022.9940416

work page doi:10.1109/iswcs56560.2022.9940416 2022
[19]

Meta-gating framework for fast and continuous resource optimization in dynamic wireless environments

Hou, Q., Lee, M., Yu, G., and Cai, Y. Meta-gating framework for fast and continuous resource optimization in dynamic wireless environments. IEEE Transactions on Communications, 71 0 (9): 0 5259--5273, 2023. doi:10.1109/TCOMM.2023.3292257

work page doi:10.1109/tcomm.2023.3292257 2023
[21]

Parameter-efficient transfer learning for NLP

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. Parameter-efficient transfer learning for NLP . In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.\ 2790--2799. P...

2019
[22]

J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W

Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9

2022
[25]

B., Chandra, B., and Yejin, C

Keisuke, S., Ronan, L. B., Chandra, B., and Yejin, C. Winogrande: An adversarial winograd schema challenge at scale. 2019

2019
[29]

META - LORA : Memory-efficient sample reweighting for fine-tuning large language models

Li, W., Zou, L., Tang, M., Yu, Q., Li, W., and Li, C. META - LORA : Memory-efficient sample reweighting for fine-tuning large language models. In Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B. D., and Schockaert, S. (eds.), Proceedings of the 31st International Conference on Computational Linguistics, pp.\ 8504--8517, Abu Dhabi, UAE, ...

2025
[30]

Rouge: A package for automatic evaluation of summaries

Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.\ 74--81, 2004

2004
[31]

Metagater: Fast learning of conditional channel gated networks via federated meta-learning

Lin, S., Yang, L., He, Z., Fan, D., and Zhang, J. Metagater: Fast learning of conditional channel gated networks via federated meta-learning. In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp.\ 164--172, 2021. doi:10.1109/MASS52906.2021.00031

work page doi:10.1109/mass52906.2021.00031 2021
[32]

S., Wang, Y., and Zhang, L

Liu, J., Xia, C. S., Wang, Y., and Zhang, L. Is your code generated by chat GPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=1qvx610Cu7

2023
[33]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP, 2018

2018
[36]

Bleu: a method for automatic evaluation of machine translation

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.\ 311--318, 2002

2002
[37]

In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models

Ponce, D. and Etchegoyhen, T. In-context learning vs. instruction tuning: The case of small and multilingual language models, 2025. URL https://arxiv.org/abs/2503.01611

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free

Qiu, Z., Wang, Z., Zheng, B., Huang, Z., Wen, K., Yang, S., Men, R., Yu, L., Huang, F., Huang, S., Liu, D., Zhou, J., and Lin, J. Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=1b7whO4SfY

2025
[40]

Qwen Team, A. G. QWEN2 TECHNICAL REPORT . Technical report, Alibaba Group, 2024

2024
[41]

and Tetreault, J

Rao, S. and Tetreault, J. R. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. In North American Chapter of the Association for Computational Linguistics, 2018

2018
[43]

Schmidhuber, J. Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In Pezzulo, G., Butz, M. V., Sigaud, O., and Baldassarre, G. (eds.), Anticipatory Behavior in Adaptive Learning Systems, pp.\ 48--76, Berlin, H...

2009
[45]

Silver, R. A. Neuronal arithmetic. Nature Reviews Neuroscience, 11: 0 474--489, 2010. URL https://api.semanticscholar.org/CorpusID:205505926

2010
[47]

D., Ng, A., and Potts, C

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., and Bethard, S. (eds.), Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.\ 1631--1642, Seattle...

2013
[50]

An observation on generalization

Sutskever, I. An observation on generalization. Simons Institute workshop on Large Language Models and Transformers, 2023. URL https://simons.berkeley.edu/talks/ilya-sutskever-openai-2023-08-14

2023
[51]

Hydralo RA : An asymmetric lo RA architecture for efficient fine-tuning

Tian, C., Shi, Z., Guo, Z., Li, L., and zhong Xu, C. Hydralo RA : An asymmetric lo RA architecture for efficient fine-tuning. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=qEpi8uWX3N

2024
[52]

and Zaslavsky, N

Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pp.\ 1--5, 2015. doi:10.1109/ITW.2015.7133169

work page doi:10.1109/itw.2015.7133169 2015
[56]

S., Arunkumar, A., Stap, D., et al

Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Naik, A., Ashok, A., Dhanasekaran, A. S., Arunkumar, A., Stap, D., et al. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 5085--5109, 2022 b

2022
[57]

V., Zhou, D., et al

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 0 24824--24837, 2022

2022
[61]

Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

2019
[65]

2024 , url =

Llama 3 Model Card , author=. 2024 , url =

2024
[66]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

2024
[67]

QWEN2 TECHNICAL REPORT

Qwen Team, Alibaba Group. QWEN2 TECHNICAL REPORT
[68]

Mistral 7B

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[69]

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , author=. arXiv preprint arXiv:2403.13372 , year=

work page internal anchor Pith review arXiv
[70]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
[71]

Forty-second International Conference on Machine Learning , year=

Metadata Conditioning Accelerates Language Model Pre-training , author=. Forty-second International Conference on Machine Learning , year=
[72]

International Conference on Learning Representations , year=

HyperNetworks , author=. International Conference on Learning Representations , year=
[73]

Reinhart, B

Alex Reinhart and Ben Markey and Michael Laudenbach and Kachatad Pantusen and Ronald Yurko and Gordon Weinberg and David West Brown , title =. Proceedings of the National Academy of Sciences , volume =. 2025 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2422455122 , abstract =

work page doi:10.1073/pnas.2422455122 2025
[74]

Comparing LLM -generated and human-authored news text using formal syntactic theory

Zamaraeva, Olga and Flickinger, Dan and Bond, Francis and G \'o mez-Rodr \'i guez, Carlos. Comparing LLM -generated and human-authored news text using formal syntactic theory. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.443

work page doi:10.18653/v1/2025.acl-long.443 2025
[75]

Substance over Style: Evaluating Proactive Conversational Coaching Agents

Srinivas, Vidya and Xu, Xuhai and Liu, Xin and Ayush, Kumar and Galatzer-Levy, Isaac and Patel, Shwetak and McDuff, Daniel and Althoff, Tim. Substance over Style: Evaluating Proactive Conversational Coaching Agents. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.a...

work page doi:10.18653/v1/2025.acl-long.1017 2025
[76]

2006 , journal =

A framework for authorship identification of online messages: Writing‐style features and classification techniques , author =. 2006 , journal =

2006
[77]

Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation

Li, Jinpeng and Zhang, Zekai and Chen, Xiuying and Zhao, Dongyan and Yan, Rui. Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.475

work page doi:10.18653/v1/2023.findings-emnlp.475 2023
[78]

Stylistic Response Generation by Controlling Personality Traits and Intent

Saha, Sougata and Das, Souvik and Srihari, Rohini. Stylistic Response Generation by Controlling Personality Traits and Intent. Proceedings of the 4th Workshop on NLP for Conversational AI. 2022. doi:10.18653/v1/2022.nlp4convai-1.16

work page doi:10.18653/v1/2022.nlp4convai-1.16 2022
[79]

S tyle DGPT : Stylized Response Generation with Pre-trained Language Models

Yang, Ze and Wu, Wei and Xu, Can and Liang, Xinnian and Bai, Jiaqi and Wang, Liran and Wang, Wei and Li, Zhoujun. S tyle DGPT : Stylized Response Generation with Pre-trained Language Models. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.140

work page doi:10.18653/v1/2020.findings-emnlp.140 2020
[80]

COLING , pages=

Paraphrasing for Style , author=. COLING , pages=
[81]

North American Chapter of the Association for Computational Linguistics , year=

Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , author=. North American Chapter of the Association for Computational Linguistics , year=
[82]

The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

Ziems, Caleb and Yu, Jane and Wang, Yi-Chia and Halevy, Alon and Yang, Diyi. The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.261

work page doi:10.18653/v1/2022.acl-long.261 2022
[83]

Towards emotional support dialog systems

Liu, Siyang and Zheng, Chujie and Demasi, Orianna and Sabour, Sahand and Li, Yu and Yu, Zhou and Jiang, Yong and Huang, Minlie. Towards Emotional Support Dialog Systems. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)....

work page doi:10.18653/v1/2021.acl-long.269 2021
[84]

D aily D ialog: A Manually Labelled Multi-turn Dialogue Dataset

Li, Yanran and Su, Hui and Shen, Xiaoyu and Li, Wenjie and Cao, Ziqiang and Niu, Shuzi. D aily D ialog: A Manually Labelled Multi-turn Dialogue Dataset. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017

2017
[85]

, author=

Emotion Detection on TV Show Transcripts with Sequence-Based Convolutional Neural Networks. , author=. AAAI Workshops , volume=
[86]

Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , url =

Rashkin, Hannah and Smith, Eric Michael and Li, Margaret and Boureau, Y-Lan. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1534

work page doi:10.18653/v1/p19-1534 2019
[87]

and Ng, Andrew and Potts, Christopher

Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D. and Ng, Andrew and Potts, Christopher. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013

2013
[88]

and Daly, Raymond E

Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher. Learning Word Vectors for Sentiment Analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011

2011
[89]

Journal of Big Data , year=

Sentiment analysis using product review data , author=. Journal of Big Data , year=
[90]

AAAI Conference on Artificial Intelligence , year=

Adaptive Prompt Routing for Arbitrary Text Style Transfer with Pre-trained Language Models , author=. AAAI Conference on Artificial Intelligence , year=
[91]

Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
[92]

Text summarization branches out , pages=

Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=
[93]

A Diversity-Promoting Objective Function for Neural Conversation Models

A diversity-promoting objective function for neural conversation models , author=. arXiv preprint arXiv:1510.03055 , year=

work page Pith review arXiv
[94]

AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance

Aston-Jones, Gary and Cohen, Jonathan D. AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. Annual Review of Neuroscience. 2005. doi:https://doi.org/10.1146/annurev.neuro.28.061604.135709

work page doi:10.1146/annurev.neuro.28.061604.135709 2005
[95]

Cohen , title =

David Servan-Schreiber and Harry Printz and Jonathan D. Cohen , title =. Science , volume =. 1990 , doi =. https://www.science.org/doi/pdf/10.1126/science.2392679 , abstract =

work page doi:10.1126/science.2392679 1990
[96]

Nature Reviews Neuroscience , year=

Neuronal arithmetic , author=. Nature Reviews Neuroscience , year=
[97]

MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning , year=

Lin, Sen and Yang, Li and He, Zhezhi and Fan, Deliang and Zhang, Junshan , booktitle=. MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning , year=
[98]

Learning to Optimize Resource in Dynamic Wireless Environment via Meta-Gating Graph Neural Network , year=

Hou, Qiushuo and Lee, Mengyuan and Yu, Guanding and Cai, Yunlong , booktitle=. Learning to Optimize Resource in Dynamic Wireless Environment via Meta-Gating Graph Neural Network , year=
[99]

Meta-Gating Framework for Fast and Continuous Resource Optimization in Dynamic Wireless Environments , year=

Hou, Qiushuo and Lee, Mengyuan and Yu, Guanding and Cai, Yunlong , journal=. Meta-Gating Framework for Fast and Continuous Resource Optimization in Dynamic Wireless Environments , year=
[100]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[101]

C ross F it: A Few-shot Learning Challenge for Cross-task Generalization in NLP

Ye, Qinyuan and Lin, Bill Yuchen and Ren, Xiang. C ross F it: A Few-shot Learning Challenge for Cross-task Generalization in NLP. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.572

work page doi:10.18653/v1/2021.emnlp-main.572 2021
[102]

UNIFIEDQA : Crossing Format Boundaries with a Single QA System

Khashabi, Daniel and Min, Sewon and Khot, Tushar and Sabharwal, Ashish and Tafjord, Oyvind and Clark, Peter and Hajishirzi, Hannaneh. UNIFIEDQA : Crossing Format Boundaries with a Single QA System. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.171

work page doi:10.18653/v1/2020.findings-emnlp.171 2020
[103]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022
[104]

Super- N atural I nstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Naik, Atharva and Ashok, Arjun and Dhanasekaran, Arut Selvan and Arunkumar, Anjana and Stap, David and Pathak, Eshaan and Karamanolakis, Giannis and Lai, Haizhi and Purohit, Ishan and Mondal, Ishani and Anderson, Jacob and Kuznia, Kirby and Doshi, Kr...

work page doi:10.18653/v1/2022.emnlp-main.340 2022
[105]

doi: 10.18653/v1/P18-1205

Zhang, Saizheng and Dinan, Emily and Urbanek, Jack and Szlam, Arthur and Kiela, Douwe and Weston, Jason. Personalizing Dialogue Agents: I have a dog, do you have pets too?. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1205

work page doi:10.18653/v1/p18-1205 2018
[106]

A dapt S um: Towards Low-Resource Domain Adaptation for Abstractive Summarization

Yu, Tiezheng and Liu, Zihan and Fung, Pascale. A dapt S um: Towards Low-Resource Domain Adaptation for Abstractive Summarization. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.471

work page doi:10.18653/v1/2021.naacl-main.471 2021
[107]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord , title =. arXiv:1803.05457v1 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[108]

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

BoolQ: Exploring the surprising difficulty of natural yes/no questions , author=. arXiv preprint arXiv:1905.10044 , year=

work page internal anchor Pith review arXiv 1905
[109]

Training Verifiers to Solve Math Word Problems

Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[110]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year=

HellaSwag: Can a Machine Really Finish Your Sentence? , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year=

Showing first 80 references.