pith. sign in

hub Mixed citations

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Mixed citation behavior. Most common role is background (44%).

54 Pith papers citing it
1,189 external citations · Crossref
Background 44% of classified citations
abstract

This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/sentencepiece.

hub tools

citation-role summary

background 4 method 3 dataset 1 other 1

citation-polarity summary

clear filters

representative citing papers

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

Rethinking Attention with Performers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

citing papers explorer

Showing 6 of 6 citing papers after filters.

  • Extending Context Window of Large Language Models via Positional Interpolation cs.CL · 2023-06-27 · conditional · none · ref 11 · internal anchor

    Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.

  • Accelerating Vision Transformers with Adaptive Patch Sizes cs.CV · 2025-10-20 · conditional · none · ref 11 · internal anchor

    APT adaptively varies patch sizes within a single image to reduce ViT token count, delivering 40-50% throughput gains on large models with no downstream performance loss.

  • Gemini: A Family of Highly Capable Multimodal Models cs.CL · 2023-12-19 · conditional · none · ref 47 · internal anchor

    Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.

  • BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 55 · internal anchor

    BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

  • CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation cs.CL · 2021-09-02 · conditional · none · ref 71 · internal anchor

    CodeT5 adds identifier-aware pre-training and bimodal dual generation to a T5-style encoder-decoder, yielding better results on defect detection, clone detection, and code-to-text, text-to-code, and code-to-code tasks than prior encoder-only or decoder-only models.

  • Gemma 2: Improving Open Language Models at a Practical Size cs.CL · 2024-07-31 · conditional · none · ref 112 · internal anchor

    Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.