pith. machine review for the scientific record. sign in

arxiv: 2605.01973 · v1 · submitted 2026-05-03 · 💻 cs.CL · cs.LG

Recognition: unknown

Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:08 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords hypernetworkmeta-gatingSwiGLUtextual conditioningLLM adaptationmeta-learningcontrollable generation
0
0 comments X

The pith

A hypernetwork produces a meta-signal β that gates SwiGLU nonlinearity to let LLMs adapt to arbitrary textual conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes activating a controllable meta-signal β inside the SwiGLU blocks of an LLM's feed-forward network, which then modulates the strength of the nonlinearity applied during computation. A separate hypernetwork reads any textual condition and outputs the appropriate β value on the fly, giving the base model a form of meta-controllability without weight updates. This design is tested on conditions that specify tasks, domains, personas, and styles, where it beats both standard finetuning and existing meta-learning methods while showing reasonable success on conditions never encountered in training. The approach directly tackles corpus heterogeneity and the risk of catastrophic forgetting that comes with repeated finetuning. If the mechanism works as described, it supplies a lightweight route for steering large models toward new behaviors using only natural-language instructions.

Core claim

By activating the meta-signal of β within the SwiGLU blocks, the model creates a meta-gating mechanism that adaptively adjusts the nonlinearity of the feed-forward network. A hypernetwork is employed which dynamically produces β on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, the method outperforms finetuning and meta-learning baselines and can generalize reasonably on unseen tasks, condition types, or instructions.

What carries the argument

Hypernetwork that dynamically outputs the meta-signal β to gate the nonlinearity inside each SwiGLU block of the FFN.

If this is right

  • The model handles textual conditions specifying tasks, domains, personas, and styles without requiring weight updates.
  • Performance exceeds both direct finetuning and prior meta-learning baselines across the tested condition types.
  • The approach extends reasonably to tasks, condition types, or instructions never seen during training.
  • Meta-controllability is added while preserving the original LLM capabilities that would otherwise be lost to catastrophic forgetting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same gating signal could be applied to multiple model components beyond the FFN to increase the range of controllable behaviors.
  • Because conditions are read as plain text, the method could be combined with retrieval or external memory to supply dynamic context at inference time.
  • A single base model trained this way might replace families of separately finetuned models for different user groups or domains.
  • The hypernetwork itself could be made smaller or shared across several LLMs to further lower the cost of adding new condition types.

Load-bearing premise

Dynamically generating the meta-signal β from arbitrary textual inputs via a hypernetwork produces stable adaptation without training instability or loss of the base model's original capabilities.

What would settle it

If training with the hypernetwork-generated β produces unstable loss curves or if inference on held-out benchmarks shows clear drops relative to the unmodified base model, the claimed controllability would not hold.

Figures

Figures reproduced from arXiv: 2605.01973 by Hongyan Li, Luo Ji, Ningyuan Xi, Qingqing Gu, Qi Qin, Teng Chen.

Figure 1
Figure 1. Figure 1: Paradigm of MeGan analogous to neuro-system. Nuero￾modulator meta-controls classical neurotransmitters, shaping the synaptic plasticity to meta-plasticity. Similarly, we implement a hypernetwork that converts textual condition inputs into meta signals. These signals are combined with layer index embedding, then meta-control the gating of LLM FFNs. or expert-level emotional supporters (Srinivas et al., 2025… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of meta-learning paradigms on LLMs. synaptic plasticity based on classical neurotransmitters; and neuromodulator produces the adaptive gain control, through several mechanisms including gating, guidance, and stabi￾lization (Aston-Jones & Cohen, 2005). The system then achieves meta-plasticity, which seamlessly adapts to chang￾ing environmental contexts without the metabolic cost of physical rewir… view at source ↗
Figure 3
Figure 3. Figure 3: Framework of MeGan. (x, y, z) indicate the input, output, and condition, respectively. k ∈ [1, K] denotes the layer index. formulation, z can semantically bridge the meta information between training and target tasks view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of β with respect to textual conditions. marks, indicating that the meta-gating mechanism helps the adaptation to these conditions. Zero-shot test on styles and sentiments. Appendix B.1 shows detailed zero-shot results on GYAFC, MIC, and SST. Appendix B.2 further exhibit the corresponding good cases of MeGan as well as on typical open-ended cases. All these promising results verify that MeGan … view at source ↗
Figure 6
Figure 6. Figure 6: compares MeGan to LoRA and SFT on Persona￾Chat and GYAFC, with respect to different model sizes. As model size decreases, MeGan’s performance degrades less than the other two methods, indicating that it is less affected by the foundation model’s capabilities. Note that this phenomenon is more pronounced in the zero-shot case, highlighting the effectiveness of MeGan for meta-learning. 8. Conclusion In this … view at source ↗
Figure 7
Figure 7. Figure 7: Averaged β with respect to different layers. β = 0 means SiLu. As β increases, the nonlinearity becomes stronger. B.4. Per-Style Result To rule out the possibility that MeGan generates imbalance response with respect to different style categories, for each stylized domain, we evaluate the metrics with respect to each style view at source ↗
Figure 8
Figure 8. Figure 8: t-SNE analysis of averaged β on GYAFC (first 16 layers). 24 view at source ↗
Figure 9
Figure 9. Figure 9: t-SNE analysis on averaged β on GYAFC (continued). 25 view at source ↗
read the original abstract

Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $\beta$ within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $\beta$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes activating a meta-signal β inside SwiGLU blocks to create a meta-gating mechanism that adaptively modulates the nonlinearity of the FFN in LLMs. A hypernetwork dynamically produces β from arbitrary textual conditions (task, domain, persona, style), enabling lightweight adaptation that the authors claim outperforms finetuning and meta-learning baselines while generalizing to unseen conditions; code is released at https://github.com/AaronJi/MeGan.

Significance. If the performance and generalization claims are substantiated, the approach could supply a scalable, parameter-efficient alternative to full finetuning or heavy meta-learning for conditioning LLMs on diverse textual inputs while reducing catastrophic forgetting. The open code release is a concrete strength that supports reproducibility and further investigation.

major comments (2)
  1. [Abstract] Abstract: the central claim that the method 'outperforms finetuning and meta-learning baselines' and 'can generalize reasonably on unseen tasks, condition types, or instructions' is stated without any quantitative metrics, baseline specifications, statistical significance, ablation studies, or tables. This absence leaves the primary empirical contribution unsupported by visible evidence.
  2. [Experiments] Experiments section (presumed §4–5): no details are supplied on the concrete performance numbers across condition types, how the hypernetwork-generated β affects training stability or base-model capabilities, compute overhead relative to baselines, or controls for the weakest assumption that the dynamic gating remains stable and effective.
minor comments (1)
  1. [Abstract] The term 'meta-controllability' is introduced without a precise definition or link to the architectural components; a short clarifying sentence would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We address the major comments point by point below, and we will make revisions to improve the clarity and support of our empirical claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the method 'outperforms finetuning and meta-learning baselines' and 'can generalize reasonably on unseen tasks, condition types, or instructions' is stated without any quantitative metrics, baseline specifications, statistical significance, ablation studies, or tables. This absence leaves the primary empirical contribution unsupported by visible evidence.

    Authors: We agree with the referee that the abstract would be strengthened by including quantitative support for the central claims. In the revised manuscript, we will update the abstract to include key performance metrics from our experiments, such as relative improvements over baselines, and reference the specific tables and statistical tests used. revision: yes

  2. Referee: [Experiments] Experiments section (presumed §4–5): no details are supplied on the concrete performance numbers across condition types, how the hypernetwork-generated β affects training stability or base-model capabilities, compute overhead relative to baselines, or controls for the weakest assumption that the dynamic gating remains stable and effective.

    Authors: We will revise the Experiments section to provide more explicit details on the performance numbers for each condition type, an analysis of the effects of the hypernetwork-generated β on training stability and preservation of base-model capabilities, comparisons of compute overhead, and additional experiments or controls to demonstrate the stability and effectiveness of the dynamic gating mechanism. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces an architectural mechanism: a hypernetwork that dynamically generates a meta-signal β to modulate the SwiGLU nonlinearity in FFN blocks, conditioned on arbitrary textual inputs (task, domain, persona, style). This is presented as an empirical engineering addition rather than a mathematical derivation. Performance claims rest on direct testing against finetuning and meta-learning baselines, with reported generalization to unseen conditions; no equations or steps reduce a 'prediction' to a fitted quantity defined in terms of itself, nor do they rely on load-bearing self-citations or imported uniqueness theorems. The construction is self-contained and externally verifiable via the released code.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, mathematical axioms, or new postulated entities beyond the high-level description of the meta-gating and hypernetwork; all such details remain unspecified.

pith-pipeline@v0.9.0 · 5448 in / 1134 out tokens · 38636 ms · 2026-05-09T17:08:01.949800+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

119 extracted references · 46 canonical work pages · 7 internal anchors

  1. [1]

    Llama 3 model card

    AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md

  2. [4]

    L., Gao, J., and Choi, Y

    Bisk, Y., Zellers, R., Bras, R. L., Gao, J., and Choi, Y. Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

  3. [6]

    Charakorn, R., Cetin, E., Tang, Y., and Lange, R. T. Text-to-lo RA : Instant transformer adaption. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=zWskCdu3QA

  4. [7]

    Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Ch...

  5. [12]

    Meta-in-context learning in large language models

    Coda-Forno, J., Binz, M., Akata, Z., Botvinick, M., Wang, J., and Schulz, E. Meta-in-context learning in large language models. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp.\ 65189--65201. Curran Associates, Inc., 2023

  6. [13]

    K., Aitchison, M., Orseau, L., Hutter, M., and Veness, J

    Del \' e tang, G., Ruoss, A., Duquenne, P., Catt, E., Genewein, T., Mattern, C., Grau - Moya, J., Wenliang, L. K., Aitchison, M., Orseau, L., Hutter, M., and Veness, J. Language modeling is compression. In ICLR , 2024

  7. [14]

    Model-agnostic meta-learning for fast adaptation of deep networks

    Finn, C., Abbeel, P., and Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 1126--1135. PMLR, 06--11 Aug 2017. URL https://proceedings.mlr.press/v70/finn17a.html

  8. [15]

    Grattafiori, A. et al. The llama 3 herd of models, 2024

  9. [16]

    M., and Le, Q

    Ha, D., Dai, A. M., and Le, Q. V. Hypernetworks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkpACe1lx

  10. [17]

    Towards a unified view of parameter-efficient transfer learning

    He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., and Neubig, G. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=0RDcd5Axok

  11. [18]

    Learning to optimize resource in dynamic wireless environment via meta-gating graph neural network

    Hou, Q., Lee, M., Yu, G., and Cai, Y. Learning to optimize resource in dynamic wireless environment via meta-gating graph neural network. In 2022 International Symposium on Wireless Communication Systems (ISWCS), pp.\ 1--6, 2022 a . doi:10.1109/ISWCS56560.2022.9940416

  12. [19]

    Meta-gating framework for fast and continuous resource optimization in dynamic wireless environments

    Hou, Q., Lee, M., Yu, G., and Cai, Y. Meta-gating framework for fast and continuous resource optimization in dynamic wireless environments. IEEE Transactions on Communications, 71 0 (9): 0 5259--5273, 2023. doi:10.1109/TCOMM.2023.3292257

  13. [21]

    Parameter-efficient transfer learning for NLP

    Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. Parameter-efficient transfer learning for NLP . In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.\ 2790--2799. P...

  14. [22]

    J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W

    Hu, E. J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. Lo RA : Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9

  15. [25]

    B., Chandra, B., and Yejin, C

    Keisuke, S., Ronan, L. B., Chandra, B., and Yejin, C. Winogrande: An adversarial winograd schema challenge at scale. 2019

  16. [29]

    META - LORA : Memory-efficient sample reweighting for fine-tuning large language models

    Li, W., Zou, L., Tang, M., Yu, Q., Li, W., and Li, C. META - LORA : Memory-efficient sample reweighting for fine-tuning large language models. In Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B. D., and Schockaert, S. (eds.), Proceedings of the 31st International Conference on Computational Linguistics, pp.\ 8504--8517, Abu Dhabi, UAE, ...

  17. [30]

    Rouge: A package for automatic evaluation of summaries

    Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.\ 74--81, 2004

  18. [31]

    Metagater: Fast learning of conditional channel gated networks via federated meta-learning

    Lin, S., Yang, L., He, Z., Fan, D., and Zhang, J. Metagater: Fast learning of conditional channel gated networks via federated meta-learning. In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp.\ 164--172, 2021. doi:10.1109/MASS52906.2021.00031

  19. [32]

    S., Wang, Y., and Zhang, L

    Liu, J., Xia, C. S., Wang, Y., and Zhang, L. Is your code generated by chat GPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=1qvx610Cu7

  20. [33]

    Can a suit of armor conduct electricity? a new dataset for open book question answering

    Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP, 2018

  21. [36]

    Bleu: a method for automatic evaluation of machine translation

    Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.\ 311--318, 2002

  22. [37]

    In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models

    Ponce, D. and Etchegoyhen, T. In-context learning vs. instruction tuning: The case of small and multilingual language models, 2025. URL https://arxiv.org/abs/2503.01611

  23. [39]

    Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free

    Qiu, Z., Wang, Z., Zheng, B., Huang, Z., Wen, K., Yang, S., Men, R., Yu, L., Huang, F., Huang, S., Liu, D., Zhou, J., and Lin, J. Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=1b7whO4SfY

  24. [40]

    Qwen Team, A. G. QWEN2 TECHNICAL REPORT . Technical report, Alibaba Group, 2024

  25. [41]

    and Tetreault, J

    Rao, S. and Tetreault, J. R. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. In North American Chapter of the Association for Computational Linguistics, 2018

  26. [43]

    Schmidhuber, J. Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In Pezzulo, G., Butz, M. V., Sigaud, O., and Baldassarre, G. (eds.), Anticipatory Behavior in Adaptive Learning Systems, pp.\ 48--76, Berlin, H...

  27. [45]

    Silver, R. A. Neuronal arithmetic. Nature Reviews Neuroscience, 11: 0 474--489, 2010. URL https://api.semanticscholar.org/CorpusID:205505926

  28. [47]

    D., Ng, A., and Potts, C

    Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., and Bethard, S. (eds.), Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.\ 1631--1642, Seattle...

  29. [50]

    An observation on generalization

    Sutskever, I. An observation on generalization. Simons Institute workshop on Large Language Models and Transformers, 2023. URL https://simons.berkeley.edu/talks/ilya-sutskever-openai-2023-08-14

  30. [51]

    Hydralo RA : An asymmetric lo RA architecture for efficient fine-tuning

    Tian, C., Shi, Z., Guo, Z., Li, L., and zhong Xu, C. Hydralo RA : An asymmetric lo RA architecture for efficient fine-tuning. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=qEpi8uWX3N

  31. [52]

    and Zaslavsky, N

    Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pp.\ 1--5, 2015. doi:10.1109/ITW.2015.7133169

  32. [56]

    S., Arunkumar, A., Stap, D., et al

    Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Naik, A., Ashok, A., Dhanasekaran, A. S., Arunkumar, A., Stap, D., et al. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 5085--5109, 2022 b

  33. [57]

    V., Zhou, D., et al

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 0 24824--24837, 2022

  34. [61]

    Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

    Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., and Choi, Y. Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

  35. [65]

    2024 , url =

    Llama 3 Model Card , author=. 2024 , url =

  36. [66]

    2024 , eprint=

    The Llama 3 Herd of Models , author=. 2024 , eprint=

  37. [67]

    QWEN2 TECHNICAL REPORT

    Qwen Team, Alibaba Group. QWEN2 TECHNICAL REPORT

  38. [68]

    Mistral 7B

    Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

  39. [69]

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , author=. arXiv preprint arXiv:2403.13372 , year=

  40. [70]

    Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

    Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

  41. [71]

    Forty-second International Conference on Machine Learning , year=

    Metadata Conditioning Accelerates Language Model Pre-training , author=. Forty-second International Conference on Machine Learning , year=

  42. [72]

    International Conference on Learning Representations , year=

    HyperNetworks , author=. International Conference on Learning Representations , year=

  43. [73]

    Reinhart, B

    Alex Reinhart and Ben Markey and Michael Laudenbach and Kachatad Pantusen and Ronald Yurko and Gordon Weinberg and David West Brown , title =. Proceedings of the National Academy of Sciences , volume =. 2025 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2422455122 , abstract =

  44. [74]

    Comparing LLM -generated and human-authored news text using formal syntactic theory

    Zamaraeva, Olga and Flickinger, Dan and Bond, Francis and G \'o mez-Rodr \'i guez, Carlos. Comparing LLM -generated and human-authored news text using formal syntactic theory. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.443

  45. [75]

    Substance over Style: Evaluating Proactive Conversational Coaching Agents

    Srinivas, Vidya and Xu, Xuhai and Liu, Xin and Ayush, Kumar and Galatzer-Levy, Isaac and Patel, Shwetak and McDuff, Daniel and Althoff, Tim. Substance over Style: Evaluating Proactive Conversational Coaching Agents. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.a...

  46. [76]

    2006 , journal =

    A framework for authorship identification of online messages: Writing‐style features and classification techniques , author =. 2006 , journal =

  47. [77]

    Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation

    Li, Jinpeng and Zhang, Zekai and Chen, Xiuying and Zhao, Dongyan and Yan, Rui. Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.475

  48. [78]

    Stylistic Response Generation by Controlling Personality Traits and Intent

    Saha, Sougata and Das, Souvik and Srihari, Rohini. Stylistic Response Generation by Controlling Personality Traits and Intent. Proceedings of the 4th Workshop on NLP for Conversational AI. 2022. doi:10.18653/v1/2022.nlp4convai-1.16

  49. [79]

    S tyle DGPT : Stylized Response Generation with Pre-trained Language Models

    Yang, Ze and Wu, Wei and Xu, Can and Liang, Xinnian and Bai, Jiaqi and Wang, Liran and Wang, Wei and Li, Zhoujun. S tyle DGPT : Stylized Response Generation with Pre-trained Language Models. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.140

  50. [80]

    COLING , pages=

    Paraphrasing for Style , author=. COLING , pages=

  51. [81]

    North American Chapter of the Association for Computational Linguistics , year=

    Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , author=. North American Chapter of the Association for Computational Linguistics , year=

  52. [82]

    The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

    Ziems, Caleb and Yu, Jane and Wang, Yi-Chia and Halevy, Alon and Yang, Diyi. The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.261

  53. [83]

    Towards emotional support dialog systems

    Liu, Siyang and Zheng, Chujie and Demasi, Orianna and Sabour, Sahand and Li, Yu and Yu, Zhou and Jiang, Yong and Huang, Minlie. Towards Emotional Support Dialog Systems. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)....

  54. [84]

    D aily D ialog: A Manually Labelled Multi-turn Dialogue Dataset

    Li, Yanran and Su, Hui and Shen, Xiaoyu and Li, Wenjie and Cao, Ziqiang and Niu, Shuzi. D aily D ialog: A Manually Labelled Multi-turn Dialogue Dataset. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017

  55. [85]

    , author=

    Emotion Detection on TV Show Transcripts with Sequence-Based Convolutional Neural Networks. , author=. AAAI Workshops , volume=

  56. [86]

    Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , url =

    Rashkin, Hannah and Smith, Eric Michael and Li, Margaret and Boureau, Y-Lan. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1534

  57. [87]

    and Ng, Andrew and Potts, Christopher

    Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D. and Ng, Andrew and Potts, Christopher. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013

  58. [88]

    and Daly, Raymond E

    Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher. Learning Word Vectors for Sentiment Analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011

  59. [89]

    Journal of Big Data , year=

    Sentiment analysis using product review data , author=. Journal of Big Data , year=

  60. [90]

    AAAI Conference on Artificial Intelligence , year=

    Adaptive Prompt Routing for Arbitrary Text Style Transfer with Pre-trained Language Models , author=. AAAI Conference on Artificial Intelligence , year=

  61. [91]

    Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

    Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

  62. [92]

    Text summarization branches out , pages=

    Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=

  63. [93]

    A Diversity-Promoting Objective Function for Neural Conversation Models

    A diversity-promoting objective function for neural conversation models , author=. arXiv preprint arXiv:1510.03055 , year=

  64. [94]

    AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance

    Aston-Jones, Gary and Cohen, Jonathan D. AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. Annual Review of Neuroscience. 2005. doi:https://doi.org/10.1146/annurev.neuro.28.061604.135709

  65. [95]

    Cohen , title =

    David Servan-Schreiber and Harry Printz and Jonathan D. Cohen , title =. Science , volume =. 1990 , doi =. https://www.science.org/doi/pdf/10.1126/science.2392679 , abstract =

  66. [96]

    Nature Reviews Neuroscience , year=

    Neuronal arithmetic , author=. Nature Reviews Neuroscience , year=

  67. [97]

    MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning , year=

    Lin, Sen and Yang, Li and He, Zhezhi and Fan, Deliang and Zhang, Junshan , booktitle=. MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning , year=

  68. [98]

    Learning to Optimize Resource in Dynamic Wireless Environment via Meta-Gating Graph Neural Network , year=

    Hou, Qiushuo and Lee, Mengyuan and Yu, Guanding and Cai, Yunlong , booktitle=. Learning to Optimize Resource in Dynamic Wireless Environment via Meta-Gating Graph Neural Network , year=

  69. [99]

    Meta-Gating Framework for Fast and Continuous Resource Optimization in Dynamic Wireless Environments , year=

    Hou, Qiushuo and Lee, Mengyuan and Yu, Guanding and Cai, Yunlong , journal=. Meta-Gating Framework for Fast and Continuous Resource Optimization in Dynamic Wireless Environments , year=

  70. [100]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  71. [101]

    C ross F it: A Few-shot Learning Challenge for Cross-task Generalization in NLP

    Ye, Qinyuan and Lin, Bill Yuchen and Ren, Xiang. C ross F it: A Few-shot Learning Challenge for Cross-task Generalization in NLP. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.572

  72. [102]

    UNIFIEDQA : Crossing Format Boundaries with a Single QA System

    Khashabi, Daniel and Min, Sewon and Khot, Tushar and Sabharwal, Ashish and Tafjord, Oyvind and Clark, Peter and Hajishirzi, Hannaneh. UNIFIEDQA : Crossing Format Boundaries with a Single QA System. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.171

  73. [103]

    Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

    Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

  74. [104]

    Super- N atural I nstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

    Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Naik, Atharva and Ashok, Arjun and Dhanasekaran, Arut Selvan and Arunkumar, Anjana and Stap, David and Pathak, Eshaan and Karamanolakis, Giannis and Lai, Haizhi and Purohit, Ishan and Mondal, Ishani and Anderson, Jacob and Kuznia, Kirby and Doshi, Kr...

  75. [105]

    doi: 10.18653/v1/P18-1205

    Zhang, Saizheng and Dinan, Emily and Urbanek, Jack and Szlam, Arthur and Kiela, Douwe and Weston, Jason. Personalizing Dialogue Agents: I have a dog, do you have pets too?. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1205

  76. [106]

    A dapt S um: Towards Low-Resource Domain Adaptation for Abstractive Summarization

    Yu, Tiezheng and Liu, Zihan and Fung, Pascale. A dapt S um: Towards Low-Resource Domain Adaptation for Abstractive Summarization. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.471

  77. [107]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord , title =. arXiv:1803.05457v1 , year =

  78. [108]

    BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

    BoolQ: Exploring the surprising difficulty of natural yes/no questions , author=. arXiv preprint arXiv:1905.10044 , year=

  79. [109]

    Training Verifiers to Solve Math Word Problems

    Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=

  80. [110]

    Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year=

    HellaSwag: Can a Machine Really Finish Your Sentence? , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year=

Showing first 80 references.