pith. sign in

hub Mixed citations

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Mixed citation behavior. Most common role is background (44%).

54 Pith papers citing it
1,189 external citations · Crossref
Background 44% of classified citations
abstract

This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/sentencepiece.

hub tools

citation-role summary

background 4 method 3 dataset 1 other 1

citation-polarity summary

clear filters

representative citing papers

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Rethinking Attention with Performers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

citing papers explorer

Showing 6 of 6 citing papers after filters.

  • Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 50 · internal anchor

    Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations cs.CL · 2019-09-26 · accept · none · ref 21 · internal anchor

    ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.

  • XLNet: Generalized Autoregressive Pretraining for Language Understanding cs.CL · 2019-06-19 · accept · none · ref 17 · internal anchor

    XLNet is a generalized autoregressive pretraining method that learns bidirectional contexts via permutation-based factorization and outperforms BERT on 20 NLP tasks.

  • An Empirical Study of Mamba-based Language Models cs.LG · 2024-06-12 · accept · none · ref 28 · internal anchor

    An 8B Mamba-2-Hybrid with 43% Mamba-2, 7% attention, and 50% MLP layers exceeds an 8B Transformer by 2.65 points on average across 12 tasks and matches it on 23 long-context tasks while enabling up to 8x faster inference.

  • PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 74 · internal anchor

    PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

  • Gemma: Open Models Based on Gemini Research and Technology cs.CL · 2024-03-13 · accept · none · ref 100 · internal anchor

    Gemma introduces open 2B and 7B LLMs derived from Gemini technology that beat comparable open models on 11 of 18 text tasks and come with safety assessments.