pith. sign in
Pith Number

pith:7GYTYXCZ

pith:2026:7GYTYXCZINHO3FIRS5UGUUW5U6
not attested not anchored not stored refs resolved

Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

Chengshuai Shi, Cong Shen, Donghao Li, Jing Yang, Weijuan Ou

Multi-objective prompt selection for large language models reduces to pure-exploration bandit problems, enabling efficient algorithms with theoretical guarantees.

arxiv:2605.14553 v1 · 2026-05-14 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{7GYTYXCZINHO3FIRS5UGUUW5U6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Casting the problem into the pure-exploration bandits framework, we adapt provably efficient algorithms from multi-objective bandits and further introduce a novel design for best feasible arm identification in structured bandits, with theoretical guarantees on the identification error in the linear case. Extensive experiments across multiple LLMs show that the bandit-based approaches yield significant improvements over baselines.

C2weakest assumption

Prompt performance across multiple objectives can be modeled as rewards from independent or linearly structured arms in a pure-exploration bandit setting without significant interference or non-stationarity from LLM stochasticity.

C3one line summary

Adapting multi-objective pure-exploration bandits enables efficient Pareto prompt set recovery and best feasible prompt identification for LLMs, with linear-case guarantees and empirical gains over baselines.

References

33 extracted · 33 resolved · 4 Pith anchors

[1] Gepa: Reflective prompt evolution can outperform reinforcement learning 2025
[2] Best arm identification in multi-armed bandits 2010
[3] Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901 1901
[5] Leaf: A benchmark for federated settings
[6] Discrete prompt optimization via constrained generation for zero-shot re-ranker 2023
Receipt and verification
First computed 2026-05-17T23:39:05.682032Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

f9b13c5c59434eed951197686a52dda7857842c3d601f2d8a70f0ddd4a739b9c

Aliases

arxiv: 2605.14553 · arxiv_version: 2605.14553v1 · doi: 10.48550/arxiv.2605.14553 · pith_short_12: 7GYTYXCZINHO · pith_short_16: 7GYTYXCZINHO3FIR · pith_short_8: 7GYTYXCZ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7GYTYXCZINHO3FIRS5UGUUW5U6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f9b13c5c59434eed951197686a52dda7857842c3d601f2d8a70f0ddd4a739b9c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f050c93ebd59f08f662dc444c16665764f030df1eef20bb1bde6cb98a3ae691b",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T08:31:17Z",
    "title_canon_sha256": "2c149f0a446ae0b8ac99743af2877fb60c45c6e55da7fa3e0aad129f942dc622"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14553",
    "kind": "arxiv",
    "version": 1
  }
}