Pith Number

pith:234KR6DC

pith:2020:234KR6DCQKCXTM72EWFGAAXG5V

not attested not anchored not stored refs resolved

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

Ashish Khetan, Milan Cvitkovic, Xin Huang, Zohar Karnin

TabTransformer applies self-attention to categorical feature embeddings to create contextual representations that raise prediction accuracy on tabular data.

arxiv:2012.06678 v1 · 2020-12-11 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{234KR6DCQKCXTM72EWFGAAXG5V}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models.

C2weakest assumption

The fifteen public datasets are representative of real-world tabular distributions and that baseline deep learning and tree methods were tuned to their best possible performance without hidden advantages for the proposed model.

C3one line summary

TabTransformer uses Transformer self-attention to generate contextual embeddings from categorical features in tabular data, outperforming prior deep learning methods by at least 1% mean AUC and matching tree-based ensembles on 15 public datasets while showing robustness to missing and noisy features

References

99 extracted · 99 resolved · 8 Pith anchors

[1] Proceedings of the ninth international conference on Information and knowledge management , pages=

[2] Advances in neural information processing systems , pages=

[3] Learning from labeled and unlabeled data with label propagation , author=. 2002 , publisher= 2002

[4] Advances in neural information processing systems , pages=

[5] Workshop on challenges in representation learning, ICML , volume=

Formal links

1 machine-checked theorem link

Cited by

35 papers in Pith

Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods

TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold

Receipt and verification

First computed	2026-05-17T23:38:46.506714Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

d6f8a8f862828579b3fa258a6002e6ed6ab6c0eac508138209d7c435e53f440a

Aliases

arxiv: 2012.06678 · arxiv_version: 2012.06678v1 · doi: 10.48550/arxiv.2012.06678 · pith_short_12: 234KR6DCQKCX · pith_short_16: 234KR6DCQKCXTM72 · pith_short_8: 234KR6DC

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/234KR6DCQKCXTM72EWFGAAXG5V \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d6f8a8f862828579b3fa258a6002e6ed6ab6c0eac508138209d7c435e53f440a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "14a98b2ef24421bfa593b9c24470b298a492a660bbbab01d932a8b622d357324",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/publicdomain/zero/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2020-12-11T23:31:23Z",
    "title_canon_sha256": "da775df861a4305faae84048f4463e76aecd6af3620932a75a861ab369a98ff6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2012.06678",
    "kind": "arxiv",
    "version": 1
  }
}