AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring

Chun Chet Ng; Jia Yu Lim; Jin Khye Tan; Wei Zeng Low; Yin Yin Boon; Zhen Hao Chu

arxiv: 2510.16066 · v4 · submitted 2025-10-17 · 💱 q-fin.ST · cs.AI· cs.CE· cs.CY· cs.LG· q-fin.RM

AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring

Chun Chet Ng , Zhen Hao Chu , Jia Yu Lim , Yin Yin Boon , Wei Zeng Low , Jin Khye Tan This is my paper

Pith reviewed 2026-05-18 06:36 UTC · model grok-4.3

classification 💱 q-fin.ST cs.AIcs.CEcs.CYcs.LGq-fin.RM

keywords bank statement analyticsalternative datacredit scoringMSME financingmachine learningfinancial inclusionMalaysiacash flow features

0 comments

The pith

Bank statement features raise credit-scoring accuracy for Malaysian MSMEs to an AUROC of 0.806, a 24.6 percent gain over application data alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines whether raw bank transaction histories can replace missing credit-bureau records when lenders assess loans to newly formed or thin-file micro and small businesses in Malaysia. The authors build an end-to-end pipeline that converts statement line items into cash-flow metrics and feeds them into standard machine-learning classifiers. On a fresh collection of 611 applicants the strongest model reaches an AUROC of 0.806 on held-out data, delivering a 24.6 percent relative lift compared with models that use only the usual application fields. If the improvement generalizes, lenders gain a concrete route to extend formal credit to many more MSMEs that currently sit outside conventional underwriting systems.

Core claim

The authors demonstrate that a cash-flow-based underwriting pipeline which extracts and encodes transaction patterns from bank statements produces credit-scoring models whose validation performance materially exceeds that of models built on application information alone, attaining an AUROC of 0.806 and a 24.6 percent relative improvement.

What carries the argument

A cash-flow-based underwriting pipeline that turns bank-statement line items into predictive transaction features and combines them with machine-learning classifiers for MSME credit risk assessment.

If this is right

MSMEs lacking credit histories obtain a practical route to formal financing through automated analysis of their existing bank records.
Lenders in emerging markets can reduce dependence on credit-bureau data that many new or small firms have never generated.
Release of the anonymized transaction dataset supports additional research on financial inclusion for Malaysian MSMEs.
Full automation of statement processing shortens underwriting time and cost for small-business loans.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transaction-derived features could be tested for predictive value in other emerging economies with similar retail-banking practices.
Future studies might examine whether performance remains stable when the pipeline is applied to firms drawn from banks or regions outside the original consulting-firm sample.
Real-time API connections to bank data could convert the pipeline into an instant-decision tool usable by loan officers at the point of application.

Load-bearing premise

The 611-applicant sample collected from a single Malaysian consulting firm is representative of the broader population of MSME loan applicants and the extracted transaction features carry stable predictive signal rather than firm-specific or selection artifacts.

What would settle it

A replication on an independent, multi-source dataset of several thousand Malaysian MSME applicants that shows no AUROC improvement or a performance decline when the bank-statement features are added to the model.

read the original abstract

Despite accounting for 96.1% of all businesses in Malaysia, access to financing remains one of the most persistent challenges faced by Micro, Small, and Medium Enterprises (MSMEs). Newly established businesses are often excluded from formal credit markets as traditional underwriting approaches rely heavily on credit bureau data. This study investigates the potential of bank statement data as an alternative data source for credit assessment to promote financial inclusion in emerging markets. First, we propose a cash flow-based underwriting pipeline where we utilize bank statement data for end-to-end data extraction and machine learning credit scoring. Second, we introduce a novel dataset of 611 loan applicants from a Malaysian consulting firm. Third, we develop and evaluate credit scoring models based on application information and bank transaction-derived features. Empirical results demonstrate that incorporating bank statement features yields substantial improvements, with our best model achieving an AUROC of 0.806 on validation set, representing a 24.6% improvement over models using application information only. Finally, we will release the anonymized bank transaction dataset to facilitate further research on MSME financial inclusion within Malaysia's emerging economy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies a new 611-record Malaysian MSME bank-statement dataset and reports a 24.6% AUROC lift to 0.806, but the single consulting-firm source makes that lift hard to trust without further checks.

read the letter

This paper's clearest contribution is the 611-applicant Malaysian MSME bank-transaction dataset and the claim that adding cash-flow features from those statements raises AUROC to 0.806 on a validation set, a 24.6 percent relative gain over application-only baselines. They plan to release the anonymized data, which fills a gap for local work on financial inclusion where traditional credit files are thin for most businesses.

Referee Report

3 major / 2 minor

Summary. The paper proposes an AI-driven pipeline for end-to-end extraction and analysis of bank statement data as alternative input for credit scoring of Malaysian MSMEs. It introduces a new dataset of 611 loan applicants sourced from a single consulting firm and reports that models incorporating bank-transaction features achieve an AUROC of 0.806 on a validation set, a 24.6% lift relative to models using only application information. The authors state they will release the anonymized transaction dataset.

Significance. If the empirical improvement can be substantiated with full methodological transparency, the work would be significant for financial-inclusion research in emerging markets by demonstrating a practical route to underwrite MSMEs that lack credit-bureau histories. The planned public release of the 611-applicant bank-transaction dataset constitutes a concrete contribution that would enable reproducibility and follow-on studies.

major comments (3)

[Abstract] Abstract: the headline claim of an AUROC of 0.806 (24.6% improvement) is presented without any description of the feature-extraction rules applied to the bank statements, the machine-learning model families, hyper-parameter search procedure, cross-validation scheme, or explicit checks for data leakage and overfitting. These omissions render the numerical result impossible to evaluate or replicate.
[Data] Data section: the entire sample of 611 applicants originates from one Malaysian consulting firm. No discussion is provided of how the firm selects or filters its clients, nor of any steps taken to assess whether the extracted cash-flow features reflect stable, population-level signals rather than firm-specific artifacts or selection effects.
[Results] Results: with only 611 observations, even modest feature engineering on transaction variables can produce spurious correlations. The manuscript supplies no information on sample-size considerations, statistical significance of the lift, or external validation that would support the generalizability claim.

minor comments (2)

[Abstract] The abstract states that the dataset 'will be released'; the manuscript should specify the intended repository, license, and timeline to make this commitment concrete.
Notation for the extracted transaction features (e.g., cash-flow ratios, frequency counts) should be defined explicitly in a table or appendix to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating where revisions will be made to improve methodological transparency and acknowledge limitations.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of an AUROC of 0.806 (24.6% improvement) is presented without any description of the feature-extraction rules applied to the bank statements, the machine-learning model families, hyper-parameter search procedure, cross-validation scheme, or explicit checks for data leakage and overfitting. These omissions render the numerical result impossible to evaluate or replicate.

Authors: We agree that the abstract is too concise to convey the necessary methodological context. The full manuscript details the bank-statement parsing pipeline (rule-based extraction of inflows/outflows combined with NLP for transaction categorization), the ensemble models (gradient boosting and random forests), grid-search hyper-parameter tuning, and stratified 5-fold cross-validation with temporal hold-out to mitigate leakage. In the revision we will expand the abstract with a brief methods summary and explicit reference to the validation scheme while preserving length constraints. revision: yes
Referee: [Data] Data section: the entire sample of 611 applicants originates from one Malaysian consulting firm. No discussion is provided of how the firm selects or filters its clients, nor of any steps taken to assess whether the extracted cash-flow features reflect stable, population-level signals rather than firm-specific artifacts or selection effects.

Authors: We acknowledge that the single-firm origin is a limitation. The consulting firm assists MSMEs seeking financing; clients are those who voluntarily approached the firm for support. We will add a description of the firm's client-acquisition process and a dedicated Limitations subsection that explicitly discusses potential selection effects and the possibility that some cash-flow patterns may be firm-specific. This will frame the study as an initial demonstration rather than a population-wide claim. revision: yes
Referee: [Results] Results: with only 611 observations, even modest feature engineering on transaction variables can produce spurious correlations. The manuscript supplies no information on sample-size considerations, statistical significance of the lift, or external validation that would support the generalizability claim.

Authors: The concern about sample size and spurious correlations is valid. We will add a feature-to-sample ratio discussion, regularization details, bootstrap confidence intervals around the AUROC, and a statistical comparison (e.g., DeLong test) of the performance lift. We will also state that external validation on an independent cohort is not available with the current data and will treat this as an explicit limitation while recommending multi-source replication in future work. revision: partial

standing simulated objections not resolved

External validation on an independent dataset from a different source or region, which is not currently accessible.

Circularity Check

0 steps flagged

No circularity: standard empirical ML evaluation on held-out validation set

full rationale

The paper reports an empirical machine-learning study on a fixed dataset of 611 applicants. Bank-statement features are extracted once, models are trained on a training partition, and AUROC is measured on a separate validation partition. The 0.806 AUROC and 24.6% lift versus the application-only baseline are therefore direct empirical comparisons on unseen data, not quantities that reduce by construction to the fitted parameters or to any self-citation. No equations, uniqueness theorems, or ansatzes are invoked; the derivation chain is simply feature extraction followed by standard supervised learning and hold-out evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that bank-transaction histories contain stable, non-spurious signals of repayment ability for new businesses and on standard supervised-learning assumptions that a modest sample size suffices for reliable AUROC estimation.

axioms (2)

domain assumption Bank transaction patterns are predictive of creditworthiness for MSMEs lacking credit-bureau records
Invoked by the decision to treat extracted cash-flow features as alternative data for underwriting.
domain assumption The 611-applicant sample is sufficiently representative for model training and validation
Required for the reported AUROC to generalize beyond the consulting-firm source.

pith-pipeline@v0.9.0 · 5758 in / 1370 out tokens · 35750 ms · 2026-05-18T06:36:22.948730+00:00 · methodology

AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)