Opt: Open pre-trained transformer language models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar · 2022

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

What learning algorithm is in-context learning? Investigations with linear models

cs.LG · 2022-11-28 · accept · novelty 8.0

Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.

LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts

cs.AI · 2024-07-06 · conditional · novelty 6.0

LogicVista is a new benchmark dataset with 448 visual logic questions that evaluates multimodal LLMs on five reasoning tasks covering nine capabilities.

Efficient Streaming Language Models with Attention Sinks

cs.CL · 2023-09-29 · accept · novelty 6.0

StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.

FP8 Formats for Deep Learning

cs.LG · 2022-09-12 · unverdicted · novelty 6.0

FP8 formats E4M3 and E5M2 match 16-bit training accuracy on CNNs, RNNs, and Transformers up to 175B parameters without hyperparameter changes.

citing papers explorer

Showing 4 of 4 citing papers.

What learning algorithm is in-context learning? Investigations with linear models cs.LG · 2022-11-28 · accept · none · ref 31
Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts cs.AI · 2024-07-06 · conditional · none · ref 35
LogicVista is a new benchmark dataset with 448 visual logic questions that evaluates multimodal LLMs on five reasoning tasks covering nine capabilities.
Efficient Streaming Language Models with Attention Sinks cs.CL · 2023-09-29 · accept · none · ref 61
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
FP8 Formats for Deep Learning cs.LG · 2022-09-12 · unverdicted · none · ref 25
FP8 formats E4M3 and E5M2 match 16-bit training accuracy on CNNs, RNNs, and Transformers up to 175B parameters without hyperparameter changes.

Opt: Open pre-trained transformer language models

fields

years

verdicts

representative citing papers

citing papers explorer