Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Jheng-Hong Yang; Jimmy Lin; Rodrigo Nogueira; Ronak Pradeep; Sheng-Chieh Lin; Xueguang Ma

arxiv: 2102.10073 · v1 · pith:N3QGLBPUnew · submitted 2021-02-19 · 💻 cs.IR

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Jimmy Lin , Xueguang Ma , Sheng-Chieh Lin , Jheng-Hong Yang , Ronak Pradeep , Rodrigo Nogueira This is my paper

classification 💻 cs.IR

keywords retrievaltoolkitpyserinipythonrankingrepresentationsresearchapproaches

0 comments

read the original abstract

Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. We also describe how our group has built a culture of replicability through shared norms and tools that enable rigorous automated testing.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BracketRank: Large Language Model Document Ranking via Reasoning-based Competitive Elimination
cs.IR 2026-04 conditional novelty 7.0

BracketRank reranks documents via LLM-driven bracket-style competitive elimination with mandatory reasoning explanations, reaching 26.56 nDCG@10 on BRIGHT and outperforming RankGPT-4 and Rank-R1-14B.
Evaluating LLMs on Real-World Software Performance Optimization
cs.SE 2026-06 unverdicted novelty 6.0

SWE-Pro benchmark shows LLMs deliver negligible runtime gains and almost no memory reductions on 102 real tasks where experts achieve 15.5x aggregate speedup and 171.3x peak memory reduction.
SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics
cs.IR 2026-05 unverdicted novelty 6.0

SPECTRA generates reproducible synthetic IR corpora up to 60,000 documents with controllable distractors, long-tail vocabulary, and graded relevance labels via a single-process Python prototype.
BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking
cs.CL 2026-04 unverdicted novelty 6.0

BiCon-Gate improves dialogue fact-checking by applying staged de-colloquialisation and gating rewrites based on semantic consistency with context, yielding gains on the DialFact benchmark over baselines including LLM ...
Mask-to-Correct$^+$: Leveraging Retriever Diversity for Masking-guided Faithful Fact Correction
cs.IR 2026-04 unverdicted novelty 5.0

Mask-to-Correct and M2C+ use diversity-aware masking in RAG to identify erroneous claim spans and produce faithful corrections, outperforming baselines by up to 14% SARI without gold evidence.