Recognition: 2 theorem links
· Lean TheoremNo Language Left Behind: Scaling Human-Centered Machine Translation
Pith reviewed 2026-05-12 17:48 UTC · model grok-4.3
The pith
A sparsely gated mixture of experts model trained on mined low-resource data achieves 44% relative BLEU improvement in translating 200 languages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A conditional compute model based on sparsely gated mixture of experts, trained with new data mining techniques for low-resource languages and with added safeguards against overfitting, raises BLEU scores by 44 percent relative to prior state-of-the-art while passing human quality and toxicity evaluations across more than 40,000 translation directions.
What carries the argument
The sparsely gated mixture of experts architecture, which routes each input to a small subset of experts, combined with tailored data mining that targets low-resource languages.
If this is right
- Thousands of previously unsupported translation directions become accurate enough for everyday use.
- A combined human-quality and toxicity benchmark becomes a standard way to judge multilingual systems.
- Releasing the model and mined data lets others add still more languages without starting from scratch.
Where Pith is reading between the lines
- Conditional routing of computation could be applied to scale speech recognition or summarization to the same wide language set.
- Gathering direct input from native speakers before model design offers a repeatable way to keep other multilingual tools grounded in actual user needs.
- Further increases in model size and data coverage could test whether the same techniques eventually support reliable translation even for languages with almost no written data.
Load-bearing premise
The new data mining and model changes produce genuinely better and safer translations instead of merely matching the new benchmark or the particular human raters used.
What would settle it
An independent human evaluation on a fresh set of low-resource sentence pairs that finds no relative BLEU gain or that finds higher rates of toxic outputs.
read the original abstract
Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, we open source all contributions described in this work, accessible at https://github.com/facebookresearch/fairseq/tree/nllb.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the No Language Left Behind (NLLB) project, which develops machine translation systems supporting 200 languages with emphasis on low-resource directions. It begins with interviews of native speakers, introduces novel data-mining techniques for low-resource data, trains a Sparsely Gated Mixture-of-Experts model augmented with architectural and training changes to mitigate overfitting across thousands of tasks, and evaluates performance on the human-translated Flores-200 benchmark. The central empirical claim is a 44% relative BLEU improvement over prior state-of-the-art, supported by large-scale human evaluation and a new toxicity benchmark covering all 200 languages; all models, code, and data-mining pipelines are open-sourced.
Significance. If the reported gains prove robust and attributable to the proposed methods rather than data or evaluation artifacts, the work would constitute a substantial advance toward inclusive, high-coverage MT. Strengths include the scale of human evaluation, the introduction of a toxicity benchmark for safety assessment across all languages, the human-centered framing via speaker interviews, and the commitment to open-sourcing. These elements provide concrete resources that could accelerate follow-on research on low-resource translation.
major comments (3)
- [Abstract] Abstract: the headline claim of a 44% relative BLEU improvement is presented without any ablation that isolates the contribution of the Sparsely Gated MoE architecture and anti-overfitting training changes from the novel data-mining pipeline. Because the test set (Flores-200) is human-translated and the training data are mined from the same broad web sources, the numerical gain cannot yet be confidently attributed to the model innovations rather than improved data quality or distributional overlap.
- [Abstract] Abstract (evaluation paragraph): the aggregate 44% BLEU figure is reported over >40,000 directions, yet no per-language variance, confidence intervals, or statistical significance tests for the relative improvement are referenced. Without these controls it is impossible to determine whether the headline number is driven by a small number of high-resource directions or reflects consistent gains on the low-resource languages that motivate the work.
- [Abstract] Abstract: the toxicity benchmark is described as covering all Flores-200 languages and is used to assess translation safety, but the abstract supplies no information on the toxicity classifier, annotation protocol, or decision thresholds. This omission is load-bearing for the safety claims that accompany the performance numbers.
minor comments (1)
- [Abstract] The abstract introduces the model as a 'conditional compute model based on Sparsely Gated Mixture of Experts' without immediately clarifying the relationship between the two phrases; a single sentence linking the terms would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on the abstract of our NLLB manuscript. We appreciate the recognition of the project's scale, human evaluation, toxicity benchmark, and open-sourcing. All major comments concern the abstract, which we will revise for greater precision and self-containment while preserving its summary nature. We respond point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim of a 44% relative BLEU improvement is presented without any ablation that isolates the contribution of the Sparsely Gated MoE architecture and anti-overfitting training changes from the novel data-mining pipeline. Because the test set (Flores-200) is human-translated and the training data are mined from the same broad web sources, the numerical gain cannot yet be confidently attributed to the model innovations rather than improved data quality or distributional overlap.
Authors: We agree that stronger isolation of contributions would be valuable. The manuscript presents the data-mining pipeline and the Sparsely Gated MoE model (with anti-overfitting changes) as complementary elements developed for the 200-language setting, with the 44% gain measured against prior SOTA systems lacking both. The full text contains separate sections detailing each and direct comparisons to prior work. We will revise the abstract to explicitly state that the reported improvement results from the integrated pipeline and to direct readers to the relevant sections for component-wise analysis. Adding exhaustive new ablations at this scale would require substantial additional compute; we therefore treat this as a partial revision focused on abstract clarity. revision: partial
-
Referee: [Abstract] Abstract (evaluation paragraph): the aggregate 44% BLEU figure is reported over >40,000 directions, yet no per-language variance, confidence intervals, or statistical significance tests for the relative improvement are referenced. Without these controls it is impossible to determine whether the headline number is driven by a small number of high-resource directions or reflects consistent gains on the low-resource languages that motivate the work.
Authors: The full manuscript and appendices report per-language BLEU scores, variance across directions, and human evaluation results demonstrating that gains are largest and most consistent for low-resource languages. Statistical support comes from the scale of the Flores-200 human evaluations. We will revise the abstract to note that the aggregate figure reflects consistent improvements on low-resource directions, as validated by the detailed per-direction and human assessments presented in the body of the paper. revision: yes
-
Referee: [Abstract] Abstract: the toxicity benchmark is described as covering all Flores-200 languages and is used to assess translation safety, but the abstract supplies no information on the toxicity classifier, annotation protocol, or decision thresholds. This omission is load-bearing for the safety claims that accompany the performance numbers.
Authors: We agree the abstract should be more self-contained on this point. The manuscript details a multilingual toxicity classifier fine-tuned on human-annotated data collected from native speakers for each language, together with the annotation protocol and thresholds calibrated via human validation. We will revise the abstract to include a concise description of the toxicity benchmark methodology, noting the classifier, native-speaker annotation, and coverage of all 200 languages. revision: yes
Circularity Check
No circularity: empirical benchmark result with no self-referential derivations
full rationale
The paper reports an engineering achievement: a sparsely-gated MoE model trained on newly mined low-resource data, with listed anti-overfitting changes, evaluated on the human-translated Flores-200 benchmark. The 44% relative BLEU figure is a direct measurement on held-out test data rather than a prediction derived from fitted parameters or prior self-citations. No equations, uniqueness theorems, or ansatzes are invoked that reduce the claimed improvement to the inputs by construction. The work is self-contained as an empirical report; any self-citations are incidental and not load-bearing for the central numerical claim.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of experts and expert capacity
- regularization coefficients and training schedule
axioms (2)
- domain assumption Mined parallel data for low-resource languages is sufficiently clean and representative for supervised training.
- domain assumption Human raters and the toxicity classifier provide reliable safety signals across all 200 languages.
Forward citations
Cited by 30 Pith papers
-
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.
-
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
-
Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization
Encoder-decoder transformers are characterized by a temporal logic extending propositional logic with a counting global modality on the encoder and a past modality on the decoder, equivalently via distributed automata.
-
Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation
Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
-
STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming
STAR-Teaming uses a Strategy-Response Multiplex Network inside a multi-agent framework to organize attack strategies into semantic communities, delivering higher attack success rates on LLMs at lower computational cos...
-
MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation
MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
-
Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models
Litmus (Re)Agent, a structured agentic system, outperforms baselines in predicting multilingual model performance from incomplete evidence on a new controlled benchmark.
-
One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging
Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.
-
Mix, Don't Tune: Bilingual Pre-Training Outperforms Hyperparameter Search in Data-Constrained Settings
Mixing auxiliary high-resource language data outperforms hyperparameter tuning in data-constrained bilingual pre-training, with gains equivalent to 2-13 times more unique target data.
-
ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset
ATD-Trans is a new geographically annotated Japanese-English travelogue dataset that reveals Japanese-enhanced models perform better on geo-entity translation while domestic Japanese locations remain harder to transla...
-
MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal
MLAIRE is a protocol that evaluates multilingual retrievers on both semantic accuracy and query-language preference using parallel passages and new metrics like LPR and Lang-nDCG, showing that standard metrics hide di...
-
Attention Sinks in Massively Multilingual Neural Machine Translation:Discovery, Analysis, and Mitigation
Attention sinks in NLLB-200 cross-attention cause non-content tokens to dominate 83-91% of mass, halving apparent content similarity; content filtering recovers linguistic signals like language clustering and mode dif...
-
RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment
RouteLMT learns to route MT requests to large or small LLMs by predicting marginal quality gain from small-model token representations, yielding a better quality-budget Pareto frontier than baselines.
-
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
-
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation
MoVE uses specialized LoRA expert adapters and a soft router to translate non-verbal vocalizations in S2ST, reproducing them in 76% of cases versus at most 14% for baselines while scoring highest on naturalness and em...
-
Geometry-Aware Localized Watermarking for Copyright Protection in Embedding-as-a-Service
GeoMark decouples local watermark triggering from centralized ownership attribution using geometry-separated anchors and adaptive neighborhoods to improve robustness against paraphrasing, dimension changes, and cluste...
-
Context-Aware Dialectal Arabic Machine Translation with Interactive Region and Register Selection
A metadata-conditioned mT5 model trained on rule-augmented dialectal Arabic data produces translations that better match intended regional varieties than high-resource baselines, despite lower BLEU scores.
-
CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training
CLEAR is a reverse-training loss that improves cross-lingual retrieval performance by up to 15% in low-resource languages while minimizing degradation in English by using English as an alignment bridge.
-
YoNER: A New Yor\`ub\'a Multi-domain Named Entity Recognition Dataset
YoNER supplies a multi-domain Yoruba NER corpus of 5k sentences plus OyoBERT, showing African-centric models beat multilingual baselines in-domain while cross-domain performance drops sharply for blogs and movies.
-
Toward Culturally Grounded Natural Language Processing
Culturally grounded NLP must shift from isolated language benchmarks to modeling communicative ecologies that encompass institutions, scripts, domains, modalities, and communities.
-
CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation
CroSearch-R1 applies search-augmented RL with cross-lingual integration and multilingual rollouts to improve RAG effectiveness on multilingual collections.
-
When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP
Data augmentation via LLMs and back-translation produces task-specific effects on NER and POS tagging for Hausa and Fongbe, with no consistent gains over baseline and opposite outcomes across tasks for the same synthe...
-
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model
Anthropogenic Regional Adaptation with GG-EZ improves cultural relevance in multimodal vision-language models for Southeast Asia by 5-15% while retaining over 98% of global performance.
-
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
-
Multilingual E5 Text Embeddings: A Technical Report
Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.
-
LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language
Qwen2.5-3B was continued-pretrained and then fine-tuned with rsLoRA r256 on Sardinian data to reach 28.5 BLEU into the language, outperforming full fine-tuning and other LoRA variants.
-
MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning
MIPIAD reports a hybrid Qwen-TF-IDF ensemble defense that reaches F1 0.9205 and reduces the English-Bangla performance gap on a 1.43-million-sample synthetic benchmark derived from BIPIA templates.
-
Phoenix-VL 1.5 Medium Technical Report
Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying comp...
-
AI-Driven Modular Services for Accessible Multilingual Education in Immersive Extended Reality Settings: Integrating Speech Processing, Translation, and Sign Language Rendering
A modular XR platform integrates Whisper, NLLB, AWS Polly, RoBERTa, flan-t5, and MediaPipe to deliver real-time multilingual and International Sign support for education, with benchmarks showing AWS Polly's low latenc...
-
A Survey of Large Language Models
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
Reference graph
Works this paper leans on
-
[1]
URL https://arxiv.org/abs/2110.03036. Benjamin Akera, Jonathan Mukiibi, Lydia Sanyu Naggayi, Claire Babirye, Isaac Owomugisha, Solomon Nsumba, Joyce Nakatumba-Nabende, Engineer Bainomugisha, Ernest Mwebaze, and John Quinn. Machine translation for african languages: Community creation of datasets and models in uganda. In 3rd Workshop on African Natural Lan...
-
[2]
URL https://openreview.net/forum?id=BK-z5qzEU-9. Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan,...
work page 2021
-
[3]
InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
JMLR.org, 2016. Faisal Alshargi, Shahd Dibas, Sakhar Alkhereyf, Reem Faraj, Basmah Abdulkareem, Sane Yagi, Ouafaa Kacha, Nizar Habash, and Owen Rambow. Morphologically annotated corpora for seven Arabic dialects: Taizi, sanaani, najdi, jordanian, syrian, iraqi and Moroccan. In Proceedings of the Fourth Arabic Natural Language Processing Workshop , 2019. C...
-
[4]
doi: 10.18653/v1/2021.iwslt-1.1
Association for Computational Linguistics. doi: 10.18653/v1/2021.iwslt-1.1. URL https://aclanthology.org/2021.iwslt-1.1. Patrick Andries. Proposition d’ajout de l’écriture tifinaghe. Organisation internationale de normalisation. Jeu universel des caractères codés sur octets (JUC). ORGANISATION INTERNATIONALE DE NORMALISATION, 2004. Mohd Zeeshan Ansari, M....
-
[5]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
URL http://arxiv.org/abs/1308.3432. Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, and Rifat Shahriyar. Banglanlg: Benchmarks and resources for evaluating low-resource natural language generation in bangla. arXiv preprint arXiv:2205.11081 , 2022. Steven Bird. Designing for language revitalisation. In Gilles Adda, Khalid Choukri, Irm- garda Kasinskai...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2020.acl-main.485 2022
-
[6]
Kevin Degila, Godson Kalipe, Jamiil Touré Ali, and Momboladji Balogoun
ISBN 9788858113622. Kevin Degila, Godson Kalipe, Jamiil Touré Ali, and Momboladji Balogoun. Parallel text dataset for Neural Machine Translation (French -> Fongbe, French -> Ewe), November
-
[7]
Stefano Demichelis and Jorgen W Weibull
URL https://doi.org/10.5281/zenodo.4266935. Stefano Demichelis and Jorgen W Weibull. Language, meaning, and games: A model of communication, coordination, and evolution. American Economic Review, 98(4):1292–1311, 2008. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language under...
-
[8]
Jukebox: A Generative Model for Music
URL https://aclanthology.org/N19-1423. Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341 , 2020. Jesse Dodge, Taylor Prewitt, Remi Tachet Des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A Smith, Nicole...
work page Pith review arXiv 2005
-
[9]
Technical report, World Health Organization, May 2017. Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathy Meier-Hellstern, Toju Duke, Lucas Dixo...
-
[10]
Association for Computational Linguistics. doi: 10.18653/v1/W18-6319. URL https://aclanthology.org/W18-6319. Manasa Prasad, Theresa Breiner, and Daan van Esch. Mining training data for language modeling across the world’s languages. In SLTU, pages 61–65, 2018. Ivan Provilkov, Dmitrii Emelianenko, and Elena Voita. BPE-dropout: Simple and effective subword ...
-
[11]
Shashi Shekhar, Dilip Kumar Sharma, and MM Sufyan Beg
URL https://openreview.net/pdf?id=B1ckMDqlg. Shashi Shekhar, Dilip Kumar Sharma, and MM Sufyan Beg. Language identification framework in code-mixed social media text based on quantum lstm —the word belongs to which language? Modern Physics Letters B , 34(06):2050086, 2020. Aditya Siddhant, Ankur Bapna, Yuan Cao, Orhan Firat, Mia Xu Chen, Sneha Kudugunta, ...
-
[12]
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.163. URL https://aclanthology.org/2021.eacl-main.163. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Y...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2021.eacl-main.163 2021
-
[13]
Association for Computational Linguistics. doi: 10.18653/v1/D16-1163. URL https://aclanthology.org/D16-1163. Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. Designing effective sparse expert models. arXiv preprint arXiv:2202.08906, 2022. Shoshana Zuboff. The age of surveillance capitalism: The fig...
-
[14]
https://github.com/facebookresearch/fairseq/tree/nllb/data 175 Corpus Name Citation # Directions # Languages AAU Ethiopian Languages Abate et al. (2018) 3 4 AI4D Degila et al. (2020); Siminyu et al. (2021) 3 5 DGT Tiedemann (2012) 94 24 ECB Tiedemann (2012) 74 19 EMEA Tiedemann (2012) 86 22 English-Twi Azunre et al. (2021a,b) 2 1 EU Bookshop Skadiņš et al...
work page 2018
-
[15]
We compare non-English-centric performance in this table
4-phase curriculum : (a) Step 0−170k: https://github.com/facebookresearch/fairseq/tree/nllb/examples/ nllb/modeling/scripts/flores200/final_lang_pairs_cl3.txt (b) Step 170k−230k: https://github.com/facebookresearch/fairseq/tree/nllb/examples/ nllb/modeling/scripts/flores200/final_lang_pairs_cl2.txt 177 MMTAfrica NLLB-200 ibo_Latn-swh_Latn 21.8/37.3 22.0/4...
-
[16]
Naive 2-phase curriculum : (a) Step 0−200k: https://github.com/facebookresearch/fairseq/tree/nllb/examples/ nllb/modeling/scripts/flores200/cl1_lang_pairs.txt (b) Step 200k−300k: https://github.com/facebookresearch/fairseq/tree/nllb/examples/ nllb/modeling/scripts/flores200/lang_pairs.txt E.2 Results E.2.1 Performance on African Languages In Table 53, we ...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.