arxiv: 2511.19693 · v3 · submitted 2025-11-24 · 💻 cs.LG · cs.AI

TREASURE: The Visa Payment Foundation Model for High-Volume Transaction Understanding

Chin-Chia Michael Yeh , Uday Singh Saini , Xin Dai , Xiran Fan , Shubham Jain , Yujie Fan , Jiarui Sun , Junpeng Wang

show 7 more authors

Menghai Pan Yingtong Dou Yuzhong Chen Vineeth Rakesh Liang Wang Yan Zheng Mahashweta Das

This is my paper

Pith reviewed 2026-05-17 05:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords foundation modeltransformertransaction datapayment networksabnormal behavior detectionrecommendation systemsconsumer behaviorfraud detection

0 comments p. Extension

The pith

A transformer model for payment transactions captures both consumer patterns and network signals to improve fraud detection and recommendations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TREASURE as a transformer-based foundation model built specifically for high-volume payment transaction records. It processes both individual consumer behavior and payment system details such as response codes to support tasks like spotting unusual activity and generating personalized suggestions. A reader would care because better modeling of this data could make commerce safer and more tailored to users. The architecture includes separate input handling for unchanging and time-varying attributes plus a training approach suited to many possible category values. Industry dataset tests show the model raises abnormal behavior detection performance by 111 percent over current production systems when used alone and boosts recommendation models by 104 percent when supplying embeddings.

Core claim

TREASURE is a multipurpose transformer-based foundation model for transaction data that simultaneously captures consumer behavior and payment network signals, featuring an input module with dedicated sub-modules for static and dynamic attributes, an efficient training paradigm for predicting high-cardinality categorical attributes, and demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%.

What carries the argument

The TREASURE transformer model with dedicated sub-modules for static and dynamic transaction attributes and an efficient training paradigm for high-cardinality categorical attributes.

If this is right

Abnormal behavior detection performance increases substantially over existing production systems.
Recommendation systems gain accuracy when using embeddings generated by the model.
Training and inference become more efficient due to the specialized input module and training paradigm.
A single model representation combines consumer behavior signals with payment network details such as response codes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same architecture could be retrained on transaction data from other payment networks to test transferability.
Similar input and training designs might apply to other high-volume sequential records such as user activity logs.
Real-time versions of the model could support immediate monitoring of incoming transactions.
Public benchmarks on open datasets would clarify how much the gains depend on the original Visa data characteristics.

Load-bearing premise

The performance gains depend on proprietary industry-grade datasets whose selection, labeling, and train-test splits are not described in detail.

What would settle it

Evaluating TREASURE on an independent public transaction dataset and finding no gain over standard production baselines would show the improvements do not hold more generally.

Figures

Figures reproduced from arXiv: 2511.19693 by Chin-Chia Michael Yeh, Jiarui Sun, Junpeng Wang, Liang Wang, Mahashweta Das, Menghai Pan, Shubham Jain, Uday Singh Saini, Vineeth Rakesh, Xin Dai, Xiran Fan, Yan Zheng, Yingtong Dou, Yujie Fan, Yuzhong Chen.

**Figure 2.** Figure 2: Grouped transactions from the same card, demon [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 4.** Figure 4: The detailed input module of TREASURE. 𝐻 represents the input to the Transformer decoder block. Numerical and categorical attributes are processed differently. Numerical attributes are first transformed to a logarithmic scale, as all numerical features in our dataset (e.g., transaction amounts, time differences between transactions) exhibit long-tail distributions. These log-scaled numerical attributes ar… view at source ↗

**Figure 3.** Figure 3: The overall model architecture of TREASURE. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: The detailed output module of TREASURE. Two [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Efficiency improvement through shared negative [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Embeddings generated by TREASURE demonstrate [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 9.** Figure 9: We developed a GUI to explore the embedding space. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Model performance scales with dataset size, with [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 11.** Figure 11: Performance scaling with model size, using 16-bit [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

read the original abstract

Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transaction Representation Encoder, a multipurpose transformer-based foundation model specifically designed for transaction data. The model simultaneously captures both consumer behavior and payment network signals (such as response codes and system flags), providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection. Verified with industry-grade datasets, TREASURE features three key capabilities: 1) an input module with dedicated sub-modules for static and dynamic attributes, enabling more efficient training and inference; 2) an efficient and effective training paradigm for predicting high-cardinality categorical attributes; and 3) demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%. We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TREASURE adds practical input modules and training tweaks for payment data but the 111% and 104% gains rest on undescribed proprietary datasets and baselines.

read the letter

The one or two things to take away are that TREASURE adds some sensible architecture pieces for transaction data but the headline performance jumps look hard to verify without more information on the experiments. What is actually new here is the input module that splits static attributes from dynamic ones, which could make training more efficient for this data type. They also have a training paradigm focused on high-cardinality categorical prediction, which is common in payment records like response codes or merchant types. These choices build on standard transformers but adapt them to the specifics of high-volume financial logs. The paper does well by testing the model in two ways: as a standalone detector for abnormal behavior and as a source of embeddings to boost recommendation systems. Including ablation studies and comparisons to production baselines shows they thought about practical use. Working with real industry-grade datasets from Visa gives it some weight that synthetic data papers lack. The soft spots are around the evaluation details. The claims of 111 percent improvement in detection and 104 percent in recommendations come from comparisons on proprietary datasets. The text does not describe how the data was sampled, how labels for abnormal behavior were created or validated, what the train validation test splits look like, or which statistical tests support the relative gains. Without that, it's difficult to know if the results would hold on other payment networks or if they depend on particular characteristics of the Visa data and the existing production systems. This is a moderate issue because the core idea might still be sound, but it limits how much we can take the numbers at face value. This paper is for people working on machine learning applications in payments, fraud prevention, or consumer analytics. A reader who needs ideas for modeling sequential transaction data with mixed feature types could find the input module and training approach useful. Someone looking for fully reproducible benchmarks or open datasets will probably not get much out of it. It deserves a serious referee because the applied nature and scale of the data make it worth feedback, even if revisions are needed on the methods section. I would recommend sending it to peer review but with specific requests for more on the experimental design and data handling.

Referee Report

2 major / 1 minor

Summary. The paper introduces TREASURE, a transformer-based foundation model for high-volume payment transaction data. It proposes a specialized input module with sub-modules for static and dynamic attributes, an efficient training objective for high-cardinality categorical attributes, and reports that the model improves abnormal behavior detection by 111% over production systems when used standalone and boosts recommendation performance by 104% when used to provide embeddings. These results are supported by ablation studies, benchmarks against production models, and case studies on industry-grade Visa datasets.

Significance. If the reported gains prove robust under detailed scrutiny, the work could meaningfully advance foundation-model approaches in financial transaction modeling, particularly for fraud detection and personalization tasks that rely on mixed static/dynamic categorical features. The emphasis on scalable handling of high-cardinality attributes and dual use as detector or embedder addresses practical constraints in payment networks. However, the proprietary datasets and absence of reproducible experimental protocols substantially limit current assessment of generalizability and impact.

major comments (2)

[Abstract and evaluation sections] Abstract and evaluation sections: the claims of 111% and 104% relative improvements are presented without any description of the underlying metrics, production baselines, dataset sampling procedure, label acquisition/validation process, train/validation/test splits, or statistical testing. These omissions are load-bearing because the central contribution rests on the magnitude and reliability of these gains on 'industry-grade' data.
[Training paradigm section] The training paradigm section: the model is trained to predict attributes drawn from the same class of industry transaction data later used for downstream evaluation, yet no mention is made of strictly held-out external benchmarks or independent validation sets. This creates a circularity risk that must be addressed to support the generalization claims.

minor comments (1)

[Abstract] The acronym expansion contains inconsistent capitalization ('TRansformer Engine As Scalable Universal transaction Representation Encoder'); standardize for readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments on our paper. We address each of the major comments in detail below. We agree that additional clarifications are needed in some areas and will make revisions accordingly. However, certain details regarding the proprietary Visa datasets cannot be fully disclosed due to privacy and confidentiality constraints.

read point-by-point responses

Referee: [Abstract and evaluation sections] Abstract and evaluation sections: the claims of 111% and 104% relative improvements are presented without any description of the underlying metrics, production baselines, dataset sampling procedure, label acquisition/validation process, train/validation/test splits, or statistical testing. These omissions are load-bearing because the central contribution rests on the magnitude and reliability of these gains on 'industry-grade' data.

Authors: We appreciate this observation and agree that more transparency would benefit readers. Due to the proprietary and sensitive nature of the Visa transaction datasets, we are unable to provide exhaustive details on dataset sampling procedures, label acquisition and validation processes, or exact train/validation/test splits, as these could compromise data privacy and reveal proprietary business practices. We will revise the manuscript to include descriptions of the underlying metrics used for the reported improvements (such as the specific performance measures for abnormal behavior detection and recommendation tasks), general characteristics of the production baselines, and any statistical testing performed where possible without violating confidentiality. We believe these additions will address the core concern while respecting data constraints. The reported gains were validated through extensive internal benchmarks on industry-grade data. revision: partial
Referee: [Training paradigm section] The training paradigm section: the model is trained to predict attributes drawn from the same class of industry transaction data later used for downstream evaluation, yet no mention is made of strictly held-out external benchmarks or independent validation sets. This creates a circularity risk that must be addressed to support the generalization claims.

Authors: We acknowledge the potential for perceived circularity. The pretraining objective involves predicting attributes from a broad corpus of transaction data to learn universal representations. The downstream tasks, including abnormal behavior detection and recommendation, utilize separate evaluation datasets with task-specific labels that are not part of the pretraining attribute prediction. To mitigate concerns, we will update the training paradigm section to explicitly state that evaluation sets are held-out and temporally separated from the pretraining data to prevent information leakage. While we do not have access to fully independent external public benchmarks due to the domain-specific nature of payment data, the internal validations use rigorous splits. We will add this clarification in the revision. revision: partial

standing simulated objections not resolved

Full disclosure of dataset details, sampling procedures, and experimental protocols due to the proprietary nature of the Visa payment transaction data.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The provided abstract and context describe a standard transformer foundation model with dedicated input sub-modules, a training paradigm for high-cardinality attribute prediction, and downstream empirical evaluations on abnormal behavior detection and recommendation tasks using industry-grade Visa datasets. No equations, self-citations, or load-bearing steps are exhibited that reduce any claimed prediction or result to its own inputs by construction. The performance numbers (111% and 104%) are presented as outcomes of comparisons against external production baselines rather than fitted parameters renamed as predictions or self-definitional constructs. The derivation is therefore self-contained as an empirical ML development process without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard transformer assumptions plus domain-specific design choices for transaction data; no new physical entities are introduced, but many hyperparameters and data-handling decisions are implicit.

free parameters (2)

transformer hyperparameters
Number of layers, attention heads, and embedding dimensions are chosen to enable efficient training on high-volume data but not enumerated in the abstract.
training objective weights
Balancing the prediction of multiple high-cardinality categorical attributes requires weighting choices that affect the learned representations.

axioms (2)

domain assumption Payment transaction records can be usefully decomposed into static customer attributes and dynamic sequence attributes.
This decomposition underpins the dedicated input sub-modules described in the abstract.
domain assumption Predicting high-cardinality categorical fields during pretraining yields representations that transfer to detection and recommendation tasks.
This is the core training paradigm claimed to be efficient and effective.

pith-pipeline@v0.9.0 · 5563 in / 1492 out tokens · 53651 ms · 2026-05-17T05:24:19.738195+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Numerical attributes are first transformed to a logarithmic scale... log-normal distributions... InfoNCE loss for high-cardinality... L = Labnormal + scaled sum of auxiliary losses
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Transformer decoder block with causal masked self-attention... 3-layer, 4 heads, hidden dim 256

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 4 internal anchors

[1]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al

work page
[2]

Advances in neural information processing systems 35 (2022), 23716–23736

Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems 35 (2022), 23716–23736

work page 2022
[3]

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901

work page 2020
[5]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[6]

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. 2024. A decoder- only foundation model for time-series forecasting. In Forty-first International Conference on Machine Learning

work page 2024
[7]

DeepSeek-AI. 2025. DeepSeek-V3. https://huggingface.co/deepseek-ai/ DeepSeek-V3 Accessed: 2025-5-9

work page 2025
[8]

Xiran Fan, Zhimeng Jiang, Chin-Chia Michael Yeh, Yuzhong Chen, Yingtong Dou, Menghai Pan, and Yan Zheng. 2025. Enhancing Foundation Models in Transaction Understanding with LLM-based Sentence Embeddings. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track . 903–911

work page 2025
[9]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval . 639–648

work page 2020
[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation 9, 8 (1997), 1735–1780

work page 1997
[11]

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE international conference on data mining. Ieee, 263–272

work page 2008
[12]

Hugging Face. 2025. Llama4. https://huggingface.co/docs/transformers/model_ doc/llama4 Accessed: 2025-5-9

work page 2025
[13]

Kazuki Irie. 2024. Why Are Positional Encodings Nonessential for Deep Autore- gressive Transformers? Revisiting a Petroglyph. arXiv preprint arXiv:2501.00659 (2024)

work page arXiv 2024
[14]

Ju-yeong Ji and Ravin Kumar. 2024. Gemma explained: An overview of Gemma model family architectures. https://developers.googleblog.com/en/gemma- explained-overview-gemma-model-family-architectures/ Accessed: 2025-5-9

work page 2024
[15]

Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, and Siva Reddy. 2023. The impact of positional encoding on length general- ization in transformers. Advances in Neural Information Processing Systems 36 (2023), 24892–24928

work page 2023
[16]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech- niques for recommender systems. Computer 42, 8 (2009), 30–37

work page 2009
[17]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform man- ifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science 2, 11 (1901), 559–572

work page 1901
[20]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning . PmLR, 8748–8763

work page 2021
[21]

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Hassen, Anderson Schneider, et al. 2023. Lag-llama: Towards foundation models for time series forecasting. In R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models

work page 2023
[22]

Archit Rathore, Sunipa Dev, Jeff M Phillips, Vivek Srikumar, Yan Zheng, Chin- Chia Michael Yeh, Junpeng Wang, Wei Zhang, and Bei Wang. 2024. VERB: Visualizing and interpreting bias mitigation techniques geometrically for word representations. ACM Transactions on Interactive Intelligent Systems 14, 1 (2024), 1–34

work page 2024
[23]

Oleksandr Shchur, Marin Biloš, and Stephan Günnemann. 2019. Intensity-free learning of temporal point processes. arXiv preprint arXiv:1909.12127 (2019)

work page arXiv 2019
[24]

Piotr Skalski, David Sutton, Stuart Burrell, Iker Perez, and Jason Wong. 2023. Towards a foundation purchasing model: Pretrained generative autoregression on transaction sequences. In Proceedings of the Fourth ACM International Conference on AI in Finance . 141–149

work page 2023
[25]

Boris Van Breugel and Mihaela Van Der Schaar. 2024. Why tabular foundation models should be a research priority. arXiv preprint arXiv:2405.01147 (2024)

work page arXiv 2024
[26]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008)

work page 2008
[27]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)

work page 2017
[28]

Visa Inc. 2020. Smarter STIP (Stand-in-Processing). https://usa.visa.com/ dam/VCOM/regional/na/us/about-visa/research/documents/smarter-stip.pdf Ac- cessed: 2025-5-8

work page 2020
[29]

Visa Inc. 2024. Visa Fact Sheet. https://corporate.visa.com/content/dam/VCOM/ corporate/documents/about-visa-factsheet.pdf Accessed: 2025-5-5

work page 2024
[30]

Visa Inc. 2025. Visa Intelligent Commerce. https://corporate.visa.com/en/ products/intelligent-commerce.html Accessed: 2025-5-8

work page 2025
[31]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval . 165–174

work page 2019
[32]

Wikipedia contributors. 2025. ISO 3166-1 numeric. Wikipedia, The Free Encyclo- pedia. https://en.wikipedia.org/wiki/ISO_3166-1_numeric Accessed: 2025-5-17

work page 2025
[33]

Yazheng Yang, Yuqi Wang, Guang Liu, Ledell Wu, and Qi Liu. 2023. Unitabe: A universal pretraining protocol for tabular foundation model in data science. arXiv preprint arXiv:2307.09249 (2023)

work page arXiv 2023
[34]

Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Audrey Der, Vivian Lai, Zhongfang Zhuang, Junpeng Wang, Liang Wang, et al . 2023. Toward a foundation model for time series data. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management . 4400–4404

work page 2023
[35]

Chin-Chia Michael Yeh, Mengting Gu, Yan Zheng, Huiyuan Chen, Javid Ebrahimi, Zhongfang Zhuang, Junpeng Wang, Liang Wang, and Wei Zhang. 2022. Embed- ding compression with hashing for efficient representation learning in large-scale graph. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4391–4401

work page 2022
[36]

Chin-Chia Michael Yeh, Vivian Lai, Uday Singh Saini, Xiran Fan, Yujie Fan, Jun- peng Wang, Xin Dai, and Yan Zheng. 2025. Empowering Time Series Forecasting with LLM-Agents. arXiv preprint arXiv:2508.04231 (2025)

work page arXiv 2025
[37]

Chin-Chia Michael Yeh, Uday Singh Saini, Junpeng Wang, Xin Dai, Xiran Fan, Yujie Sun, Jiarui Fan, and Yan Zheng. 2025. TiCT: A Synthetically Pre-Trained Foundation Model for Time Series Classification. arXiv preprint arXiv:2511.19694 (2025)

work page arXiv 2025
[38]

Dongyu Zhang, Liang Wang, Xin Dai, Shubham Jain, Junpeng Wang, Yujie Fan, Chin-Chia Michael Yeh, Yan Zheng, Zhongfang Zhuang, and Wei Zhang. 2023. Fata-trans: Field and time-aware transformer for sequential tabular data. In Pro- ceedings of the 32nd ACM International Conference on Information and Knowledge Management. 3247–3256

work page 2023
[39]

Yan Zheng, Junpeng Wang, Chin-Chia Michael Yeh, Yujie Fan, Huiyuan Chen, Liang Wang, and Wei Zhang. 2023. Embeddingtree: Hierarchical exploration of entity features in embedding. In 2023 IEEE 16th Pacific Visualization Symposium (PacificVis). IEEE, 217–221. 9

work page 2023