pith. sign in
Pith Number

pith:234KR6DC

pith:2020:234KR6DCQKCXTM72EWFGAAXG5V
not attested not anchored not stored refs resolved

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

Ashish Khetan, Milan Cvitkovic, Xin Huang, Zohar Karnin

TabTransformer applies self-attention to categorical feature embeddings to create contextual representations that raise prediction accuracy on tabular data.

arxiv:2012.06678 v1 · 2020-12-11 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{234KR6DCQKCXTM72EWFGAAXG5V}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models.

C2weakest assumption

The fifteen public datasets are representative of real-world tabular distributions and that baseline deep learning and tree methods were tuned to their best possible performance without hidden advantages for the proposed model.

C3one line summary

TabTransformer uses Transformer self-attention to generate contextual embeddings from categorical features in tabular data, outperforming prior deep learning methods by at least 1% mean AUC and matching tree-based ensembles on 15 public datasets while showing robustness to missing and noisy features

References

99 extracted · 99 resolved · 8 Pith anchors

[1] Proceedings of the ninth international conference on Information and knowledge management , pages=
[2] Advances in neural information processing systems , pages=
[3] Learning from labeled and unlabeled data with label propagation , author=. 2002 , publisher= 2002
[4] Advances in neural information processing systems , pages=
[5] Workshop on challenges in representation learning, ICML , volume=

Formal links

1 machine-checked theorem link

Cited by

35 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.506714Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d6f8a8f862828579b3fa258a6002e6ed6ab6c0eac508138209d7c435e53f440a

Aliases

arxiv: 2012.06678 · arxiv_version: 2012.06678v1 · doi: 10.48550/arxiv.2012.06678 · pith_short_12: 234KR6DCQKCX · pith_short_16: 234KR6DCQKCXTM72 · pith_short_8: 234KR6DC
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/234KR6DCQKCXTM72EWFGAAXG5V \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d6f8a8f862828579b3fa258a6002e6ed6ab6c0eac508138209d7c435e53f440a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "14a98b2ef24421bfa593b9c24470b298a492a660bbbab01d932a8b622d357324",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/publicdomain/zero/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2020-12-11T23:31:23Z",
    "title_canon_sha256": "da775df861a4305faae84048f4463e76aecd6af3620932a75a861ab369a98ff6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2012.06678",
    "kind": "arxiv",
    "version": 1
  }
}