GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
hub
A Call for Clarity in Reporting BLEU Scores
10 Pith papers cite this work. Polarity classification is still indexing.
abstract
The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric. Although people refer to "the" BLEU score, BLEU is in fact a parameterized metric whose values can vary wildly with changes to these parameters. These parameters are often not reported or are hard to find, and consequently, BLEU scores between papers cannot be directly compared. I quantify this variation, finding differences as high as 1.8 between commonly used configurations. The main culprit is different tokenization and normalization schemes applied to the reference. Pointing to the success of the parsing community, I suggest machine translation researchers settle upon the BLEU scheme used by the annual Conference on Machine Translation (WMT), which does not allow for user-supplied reference processing, and provide a new tool, SacreBLEU, to facilitate this.
hub tools
representative citing papers
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
SentencePiece trains subword models directly from raw text to enable language-independent neural text processing.
Gated lexical shortcut connections added to the transformer yield 0.9 BLEU average gains on five WMT directions while lowering the lexical content stored in hidden states.
AITP is a new multimodal large language model that uses multimodal chain-of-thought and retrieval-augmented generation of legal knowledge to achieve state-of-the-art results on traffic accident responsibility allocation and related tasks, supported by the DecaTARA benchmark of 67,941 videos.
RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.
LASER sentence embeddings are applied directly to filter parallel corpora, achieving the best BLEU scores in the WMT19 low-resource tasks for Nepali-English and Sinhala-English by margins of 1.3 and 1.4.
Development of domain-specific scientific corpora for English-Spanish, English-French, and English-Portuguese and their application to fine-tuning NMT models.
CoT prompting improves LLM performance on control-flow deobfuscation of C benchmarks, yielding ~16% better CFG reconstruction and ~20.5% better semantic preservation for GPT5 versus zero-shot prompting.
Baidu-OSU WMT19 system achieves >10 BLEU gain on En-Fr and Fr-En social media translation via domain sensitive training and pseudo noisy sources.
citing papers explorer
-
Language Models are Few-Shot Learners
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
-
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
SentencePiece trains subword models directly from raw text to enable language-independent neural text processing.
-
Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts
Gated lexical shortcut connections added to the transformer yield 0.9 BLEU average gains on five WMT directions while lowering the lexical content stored in hidden states.
-
AITP: Traffic Accident Responsibility Allocation via Multimodal Large Language Models
AITP is a new multimodal large language model that uses multimodal chain-of-thought and retrieval-augmented generation of legal knowledge to achieve state-of-the-art results on traffic accident responsibility allocation and related tasks, supported by the DecaTARA benchmark of 67,941 videos.
-
Root Mean Square Layer Normalization
RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.
-
Low-Resource Corpus Filtering using Multilingual Sentence Embeddings
LASER sentence embeddings are applied directly to filter parallel corpora, achieving the best BLEU scores in the WMT19 low-resource tasks for Nepali-English and Sinhala-English by margins of 1.3 and 1.4.
-
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain
Development of domain-specific scientific corpora for English-Spanish, English-French, and English-Portuguese and their application to fine-tuning NMT models.
-
Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks
CoT prompting improves LLM performance on control-flow deobfuscation of C benchmarks, yielding ~16% better CFG reconstruction and ~20.5% better semantic preservation for GPT5 versus zero-shot prompting.
-
Robust Machine Translation with Domain Sensitive Pseudo-Sources: Baidu-OSU WMT19 MT Robustness Shared Task System Report
Baidu-OSU WMT19 system achieves >10 BLEU gain on En-Fr and Fr-En social media translation via domain sensitive training and pseudo noisy sources.