Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing , year =
8 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Norm-Anchor Scaling breaks the norm-feedback loop in sequential LLM editing by anchoring value vectors to original norms, improving long-run performance by 72.2% and extending the editing horizon over 4x.
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
Proposes a textbook-based true/false QA task where PTLMs score ~50% closed-book even after pre-training on the text and ~60% open-book with retrieval.
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.
LLM-Metrics probes memory in 17 LLMs across 549 2023-2024 CS papers and finds a modest Spearman correlation (rho=0.1495) with citation counts, stronger for 2024 papers.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
citing papers explorer
-
Locating and Editing Factual Associations in GPT
Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.
-
Norm Anchors Make Model Edits Last
Norm-Anchor Scaling breaks the norm-feedback loop in sequential LLM editing by anchoring value vectors to original norms, improving long-run performance by 72.2% and extending the editing horizon over 4x.
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
-
Perhaps PTLMs Should Go to School -- A Task to Assess Open Book and Closed Book QA
Proposes a textbook-based true/false QA task where PTLMs score ~50% closed-book even after pre-training on the text and ~60% open-book with retrieval.
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
-
Gemini: A Family of Highly Capable Multimodal Models
Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.
-
LLM-Metrics: Measuring Research Impact Through Large Language Model Memory
LLM-Metrics probes memory in 17 LLMs across 549 2023-2024 CS papers and finds a modest Spearman correlation (rho=0.1495) with citation counts, stronger for 2024 papers.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.