pith. sign in
Pith Number

pith:QOQAVYY4

pith:2026:QOQAVYY43YPYNKHKKZWIYBHJLV
not attested not anchored not stored refs resolved

Unlocking Biological Workflows for Robust Protein-Text Question Answering: A Dual-Dimensional RAG Framework

Chen Huang, Duanyu Feng, Li Ding, See-kiong Ng, Wenqiang Lei, Yang Li, Yangshuai Wang

2D-ProteinRAG embeds LLMs in BLAST workflows with dual filtering to handle novel proteins in question answering

arxiv:2605.17261 v1 · 2026-05-17 · cs.IR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{QOQAVYY43YPYNKHKKZWIYBHJLV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Extensive evaluations on both In-Distribution and diverse biological OOD benchmarks demonstrate that 2D-ProteinRAG consistently achieves state-of-the-art performance, outperforming fine-tuned baselines and other RAG methods.

C2weakest assumption

That the proposed horizontal fine-grained attribute alignment and vertical homology-based semantic denoising steps, when applied after BLAST retrieval, will reliably extract high-quality information from noisy contexts and generalize to novel proteins without introducing new errors.

C3one line summary

2D-ProteinRAG is a dual-dimensional RAG framework that incorporates BLAST workflows plus horizontal attribute alignment and vertical homology denoising to improve protein-text QA on both in-distribution and out-of-distribution cases.

References

38 extracted · 38 resolved · 4 Pith anchors

[1] S F Altschul, W Gish, W Miller, E W Myers, and D J Lipman. 1990. Basic local alignment search tool.J. Mol. Biol.215, 3 (Oct. 1990), 403–410 1990
[2] Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities 2025 · arXiv:2507.06261
[3] The UniProt Consortium. 2024. UniProt: the Universal Pro- tein Knowledgebase in 2025.Nucleic Acids Research53, D1 (11 2024), D609–D617. arXiv:https://academic.oup.com/nar/article- pdf/53/D1/D609/60719 2024 · doi:10.1093/nar/gkae1010
[4] D Devos and A Valencia. 2000. Practical limits of function prediction.Proteins 41, 1 (Oct. 2000), 98–107 2000
[5] arXiv preprint arXiv:2501.10282 (2025) 2025

Formal links

1 machine-checked theorem link

Receipt and verification
First computed 2026-05-20T00:03:48.299779Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

83a00ae31cde1f86a8ea566c8c04e95d654531184adaea6006f9a4ffe0cafb51

Aliases

arxiv: 2605.17261 · arxiv_version: 2605.17261v1 · doi: 10.48550/arxiv.2605.17261 · pith_short_12: QOQAVYY43YPY · pith_short_16: QOQAVYY43YPYNKHK · pith_short_8: QOQAVYY4
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QOQAVYY43YPYNKHKKZWIYBHJLV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 83a00ae31cde1f86a8ea566c8c04e95d654531184adaea6006f9a4ffe0cafb51
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "b057d46bd46226285775af30661f89963b9652bb9a3ef8661d75363266f42e15",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.IR",
    "submitted_at": "2026-05-17T05:03:24Z",
    "title_canon_sha256": "c31592c0b5e6642281f93c4b001ef43594e0eac7f39013260bcc93ae2580591f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17261",
    "kind": "arxiv",
    "version": 1
  }
}