pith. sign in
Pith Number

pith:ZGTKTZFW

pith:2023:ZGTKTZFW6VAR2IZJADJARJK4TJ
not attested not anchored not stored refs resolved

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Hang Li, Hao Cheng, Jean-Francois Ton, Muhammad Faaiz Taufiq, Ruocheng Guo, Xiaoying Zhang, Yang Liu, Yegor Klochkov, Yuanshun Yao

A survey finds that more aligned LLMs generally achieve higher trustworthiness, though the gains differ across categories.

arxiv:2308.05374 v2 · 2023-08-10 · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ZGTKTZFW6VAR2IZJADJARJK4TJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered.

C2weakest assumption

That the seven categories and 29 sub-categories comprehensively capture trustworthiness and that the selected eight sub-categories plus the chosen measurement methods accurately reflect real-world alignment.

C3one line summary

Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

References

300 extracted · 300 resolved · 29 Pith anchors

[1] Training language models to follow instructions with human feedback 2022
[2] Alignment of language agents 2021
[3] OpenAI. Gpt-4. https://openai.com/research/gpt-4, 2023 2023
[4] On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021 2021
[5] Language models are unsupervised multitask learners 2019

Formal links

2 machine-checked theorem links

Cited by

29 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:12.820356Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c9a6a9e4b6f5411d232900d208a55c9a7de412fd7489d4c2e8ab15a9219e1409

Aliases

arxiv: 2308.05374 · arxiv_version: 2308.05374v2 · doi: 10.48550/arxiv.2308.05374 · pith_short_12: ZGTKTZFW6VAR · pith_short_16: ZGTKTZFW6VAR2IZJ · pith_short_8: ZGTKTZFW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZGTKTZFW6VAR2IZJADJARJK4TJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c9a6a9e4b6f5411d232900d208a55c9a7de412fd7489d4c2e8ab15a9219e1409
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f486721c6f283b619343311b946661d598241a74b6d7b31ef1a7c3e8492341d3",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2023-08-10T06:43:44Z",
    "title_canon_sha256": "e4f29685ef9212d331f35b161dfd4efe86e04c62c4d0faf6cdb9dac9031623f4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2308.05374",
    "kind": "arxiv",
    "version": 2
  }
}