FastICA with Learned Scores from the Empirical Characteristic Function

David Watts; Jonathan H. Manton

arxiv: 2604.22125 · v1 · submitted 2026-04-24 · 📡 eess.SP

FastICA with Learned Scores from the Empirical Characteristic Function

David Watts , Jonathan H. Manton This is my paper

Pith reviewed 2026-05-08 10:46 UTC · model grok-4.3

classification 📡 eess.SP

keywords independent component analysisFastICAempirical characteristic functionsource separationnonlinear score functionblind source separationsignal processing

0 comments

The pith

FastICA can learn its nonlinear score function directly from the empirical characteristic function of the observed mixtures, removing the need to guess it in advance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to adapt the popular FastICA algorithm so that it estimates a suitable nonlinear function straight from the data rather than requiring the user to pick one that matches the unknown source distributions. This learned function comes from the empirical characteristic function computed on the mixtures. Experiments demonstrate that the resulting separation error remains close to the performance of the single best fixed function even when sources are heavy-tailed or discrete. The computational cost stays comparable to ordinary FastICA because the extra estimation step is inexpensive. A sympathetic reader would care because many real signals have unknown distributions, so an automatic choice removes a common source of failure.

Core claim

By replacing the fixed nonlinear score function in FastICA with one estimated from the empirical characteristic function of the observed data, the algorithm recovers independent sources with separation quality that stays near the best hand-tuned choice across heavy-tailed and discrete source mixtures while preserving the original runtime.

What carries the argument

The score function derived from the empirical characteristic function of the mixtures, which replaces the user-supplied nonlinearity inside the FastICA iteration.

If this is right

Separation error remains close to the optimum obtained by the single best fixed function for heavy-tailed and discrete sources.
Runtime stays comparable to standard FastICA because the empirical characteristic function is cheap to compute.
No user choice of nonlinearity is required when the source distribution is unknown.
The approach works for synthetic mixtures without assuming any particular parametric family for the sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same estimation idea could be inserted into other fixed-point ICA algorithms that rely on a nonlinearity.
Online versions might update the score function as new data arrives, allowing the demixing matrix to track slowly changing source statistics.
The method might reduce the performance gap between FastICA and slower but distribution-adaptive algorithms in practice.

Load-bearing premise

The empirical characteristic function computed from the mixtures alone is sufficient to produce a nonlinear score that recovers the independent sources without knowing their true distributions or the mixing matrix.

What would settle it

Run the method on a fresh collection of linear mixtures whose sources have distributions never seen in the training experiments and measure whether the separation error exceeds that of the best fixed nonlinearity by a large margin.

Figures

Figures reproduced from arXiv: 2604.22125 by David Watts, Jonathan H. Manton.

**Figure 1.** Figure 1: Performance over Nmc = 100 Monte Carlo trials for m = 8 sources; each panel summarises the distribution of trial-wise Amari error or runtime. d) Tabulation grid size J: Whereas R, B, and L primarily affect statistical error through the P-bECF estimator, J controls the numerical approximation error of the tabulated FastICA nonlinearity g and its derivative g ′ . We tabulate g on a uniform grid over a bounde… view at source ↗

read the original abstract

Independent component analysis (ICA) estimates a demixing matrix that can recover statistically independent sources from linear mixtures. FastICA is a popular ICA algorithm due to its efficiency, but its performance strongly depends on a user-chosen nonlinear function matched to the source distribution. When the source distribution is unknown, this function must be guessed at, and incorrect guesses can lead to significant drops in performance. We remove the need to guess by estimating a suitable function directly from the observed data. Our experiments show that the separation error stays close to the best fixed choice across synthetic mixtures comprising heavy-tailed or discrete sources while retaining a FastICA-like runtime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper estimates a nonlinearity for FastICA from the empirical characteristic function of the mixtures and gets close to optimal separation on the tested synthetics, but the step from joint ECF to useful univariate score still needs a clearer justification.

read the letter

This paper estimates the nonlinearity for FastICA directly from the empirical characteristic function of the observed mixtures. That removes the usual need to guess a function when the source distributions are unknown, and the reported experiments keep separation error near the best fixed choice on synthetic heavy-tailed and discrete sources while holding runtime similar to standard FastICA.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a modification to FastICA in which the nonlinear score function is estimated directly from the empirical characteristic function of the observed mixtures rather than chosen by the user. The central claim is that this data-driven score yields separation performance close to that of the best fixed nonlinearity across synthetic mixtures with heavy-tailed or discrete sources, while preserving FastICA-like computational cost.

Significance. A reliable method for automatically selecting the nonlinearity in FastICA would be a meaningful practical advance, as the performance of the algorithm is known to be sensitive to this choice when source distributions are unknown. The use of the empirical characteristic function as the basis for learning the score is an interesting data-driven idea that could reduce reliance on prior knowledge. However, the absence of a clear derivation showing how a univariate score is obtained from the joint ECF of the mixtures (without explicit knowledge of the mixing matrix) limits the assessed significance at present.

major comments (2)

[Abstract and proposed method] The abstract and the description of the method state that a suitable nonlinear score is estimated directly from the empirical characteristic function of the observed mixtures X = A S. Because the joint characteristic function satisfies φ_X(t) = ∏_i φ_{s_i}(a_i^T t), any procedure that produces a univariate g from φ_X must implicitly handle the unknown linear transformation A. The manuscript provides no explicit mapping, marginalization, or surrogate that justifies why a direct functional of the joint ECF yields an effective g for the demixed coordinates; this step is load-bearing for the claim that no knowledge of A or the source distributions is required.
[Experiments] The experimental claims that 'the separation error stays close to the best fixed choice' and that performance is 'near-optimal' are not supported by sufficient detail. No quantitative metrics (e.g., Amari distance, SIR), exact source families and mixing matrices, number of trials, baseline implementations, or failure cases are reported, making it impossible to verify whether the observed closeness holds beyond the particular synthetic cases tested or is an artifact of the chosen source families.

minor comments (1)

[Notation and method] The notation used for the empirical characteristic function, the learned score, and any associated estimators should be introduced with explicit equations early in the manuscript to improve clarity and reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. The feedback identifies key areas where additional clarification and detail will strengthen the presentation. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Abstract and proposed method] The abstract and the description of the method state that a suitable nonlinear score is estimated directly from the empirical characteristic function of the observed mixtures X = A S. Because the joint characteristic function satisfies φ_X(t) = ∏_i φ_{s_i}(a_i^T t), any procedure that produces a univariate g from φ_X must implicitly handle the unknown linear transformation A. The manuscript provides no explicit mapping, marginalization, or surrogate that justifies why a direct functional of the joint ECF yields an effective g for the demixed coordinates; this step is load-bearing for the claim that no knowledge of A or the source distributions is required.

Authors: We thank the referee for this observation. The manuscript describes the overall procedure but does not contain a self-contained derivation showing how the joint ECF is reduced to a univariate score without knowledge of A. We will add a new subsection in the revised method section that provides this justification: we explain that the FastICA fixed-point iteration supplies a running estimate of each demixing vector, allowing the joint ECF to be evaluated along that one-dimensional direction to produce an effective univariate characteristic function; the score is then obtained from the derivative of the log-density approximated via the ECF. We will include the corresponding equations, a short proof sketch, and pseudocode. This addition directly addresses the load-bearing step identified by the referee. revision: yes
Referee: [Experiments] The experimental claims that 'the separation error stays close to the best fixed choice' and that performance is 'near-optimal' are not supported by sufficient detail. No quantitative metrics (e.g., Amari distance, SIR), exact source families and mixing matrices, number of trials, baseline implementations, or failure cases are reported, making it impossible to verify whether the observed closeness holds beyond the particular synthetic cases tested or is an artifact of the chosen source families.

Authors: We agree that the experimental section requires substantially more detail to support the claims and enable reproducibility. In the revised manuscript we will: (i) report quantitative metrics (Amari distance and SIR) with means and standard deviations; (ii) list the exact source families (Laplace, uniform, Bernoulli, etc.) and the mixing matrices (random orthogonal and ill-conditioned); (iii) state the number of Monte Carlo trials (50); (iv) specify the baseline fixed nonlinearities and their implementations; and (v) add a paragraph discussing failure cases (e.g., Gaussian sources or very small sample sizes). These changes will allow readers to verify that performance remains close to the best fixed choice across the tested regimes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; estimation procedure is data-driven and independent of target outputs

full rationale

The paper describes estimating a nonlinear score function directly from the empirical characteristic function computed on the observed mixtures, then plugging the result into the standard FastICA iteration. This is an explicit estimation step from data properties rather than a self-referential definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain. No equations or claims in the abstract or described method reduce the learned score to the separation error or mixing matrix by construction. Experiments compare against fixed nonlinearities on synthetic data, providing external validation rather than tautological confirmation. The approach is therefore self-contained as a heuristic estimation technique.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard ICA assumption of linear mixtures of independent sources plus the novel claim that the empirical characteristic function yields an effective score function.

axioms (2)

domain assumption Observed signals are linear mixtures of statistically independent sources.
This is the foundational assumption of all ICA methods.
ad hoc to paper The empirical characteristic function of the mixtures encodes sufficient information to construct an effective nonlinear score function for separation.
This is the core technical assumption introduced by the paper.

pith-pipeline@v0.9.0 · 5391 in / 1268 out tokens · 68622 ms · 2026-05-08T10:46:10.956434+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Fast and robust fixed-point algorithms for independent component analysis,

A. Hyvarinen, “Fast and robust fixed-point algorithms for independent component analysis,”IEEE transactions on Neural Networks, vol. 10, no. 3, pp. 626–634, 1999

work page 1999
[2]

Efficient variant of algorithm fastica for independent component analysis attaining the cram ´er-rao lower bound,

Z. Koldovsky, P. Tichavsky, and E. Oja, “Efficient variant of algorithm fastica for independent component analysis attaining the cram ´er-rao lower bound,”IEEE Transactions on neural networks, vol. 17, no. 5, pp. 1265–1277, 2006

work page 2006
[3]

Fast algorithms for mutual information based independent component analysis,

D.-T. Pham, “Fast algorithms for mutual information based independent component analysis,”IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 2690–2700, 2004

work page 2004
[4]

Ica using spacings estimates of entropy,

E. G. Learned-Milleret al., “Ica using spacings estimates of entropy,” Journal of machine learning research, vol. 4, no. Dec, pp. 1271–1295, 2003

work page 2003
[5]

Characteristic-function-based independent component analysis,

J. Eriksson and V . Koivunen, “Characteristic-function-based independent component analysis,”Signal Processing, vol. 83, no. 10, pp. 2195–2208, 2003

work page 2003
[6]

Consistent independent component analysis and prewhitening,

A. Chen and P. J. Bickel, “Consistent independent component analysis and prewhitening,”IEEE Transactions on Signal Processing, vol. 53, no. 10, pp. 3625–3632, 2005

work page 2005
[7]

Estimation of independent component analysis systems,

V . Starck, “Estimation of independent component analysis systems,” arXiv preprint arXiv:2511.04273, 2025

work page arXiv 2025
[8]

Nonparametric evaluation of noisy ica solutions,

S. Kumar, P. Sarkar, P. Bickel, and D. Bean, “Nonparametric evaluation of noisy ica solutions,” inAdvances in Neural Information Processing Systems, 2024

work page 2024
[9]

Fold: Efficient fourier-series-based score estimation for langevin diffusion,

S. Asokan, A. Srikanth, N. Shetty, and C. S. Seelamantula, “Fold: Efficient fourier-series-based score estimation for langevin diffusion,” 2024, extended abstract

work page 2024
[10]

Application of the empirical characteristic function to compare and estimate densities by pooling information,

L. Ferr ´e and J. Whittaker, “Application of the empirical characteristic function to compare and estimate densities by pooling information,” Computational Statistics, vol. 19, no. 2, pp. 169–192, May 2004

work page 2004
[11]

N. G. Ushakov,Selected topics in characteristic functions. Walter de Gruyter, 2011

work page 2011
[12]

B. W. Silverman,Density estimation for statistics and data analysis. CRC press, 1986, vol. 26

work page 1986
[13]

On the errors involved in computing the empirical characteristic function,

M. Jones and H. Lotwick, “On the errors involved in computing the empirical characteristic function,”Journal of Statistical Computation and Simulation, vol. 17, no. 2, pp. 133–149, 1983

work page 1983
[14]

On the determination of probability distributions of more dimensions by their projections,

A. Heppes, “On the determination of probability distributions of more dimensions by their projections,”Acta Mathematica Hungarica, vol. 7, no. 3-4, pp. 403–410, 1956

work page 1956
[15]

A new learning algorithm for blind signal separation,

S.-i. Amari, A. Cichocki, and H. Yang, “A new learning algorithm for blind signal separation,”Advances in neural information processing systems, vol. 8, 1995

work page 1995

[1] [1]

Fast and robust fixed-point algorithms for independent component analysis,

A. Hyvarinen, “Fast and robust fixed-point algorithms for independent component analysis,”IEEE transactions on Neural Networks, vol. 10, no. 3, pp. 626–634, 1999

work page 1999

[2] [2]

Efficient variant of algorithm fastica for independent component analysis attaining the cram ´er-rao lower bound,

Z. Koldovsky, P. Tichavsky, and E. Oja, “Efficient variant of algorithm fastica for independent component analysis attaining the cram ´er-rao lower bound,”IEEE Transactions on neural networks, vol. 17, no. 5, pp. 1265–1277, 2006

work page 2006

[3] [3]

Fast algorithms for mutual information based independent component analysis,

D.-T. Pham, “Fast algorithms for mutual information based independent component analysis,”IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 2690–2700, 2004

work page 2004

[4] [4]

Ica using spacings estimates of entropy,

E. G. Learned-Milleret al., “Ica using spacings estimates of entropy,” Journal of machine learning research, vol. 4, no. Dec, pp. 1271–1295, 2003

work page 2003

[5] [5]

Characteristic-function-based independent component analysis,

J. Eriksson and V . Koivunen, “Characteristic-function-based independent component analysis,”Signal Processing, vol. 83, no. 10, pp. 2195–2208, 2003

work page 2003

[6] [6]

Consistent independent component analysis and prewhitening,

A. Chen and P. J. Bickel, “Consistent independent component analysis and prewhitening,”IEEE Transactions on Signal Processing, vol. 53, no. 10, pp. 3625–3632, 2005

work page 2005

[7] [7]

Estimation of independent component analysis systems,

V . Starck, “Estimation of independent component analysis systems,” arXiv preprint arXiv:2511.04273, 2025

work page arXiv 2025

[8] [8]

Nonparametric evaluation of noisy ica solutions,

S. Kumar, P. Sarkar, P. Bickel, and D. Bean, “Nonparametric evaluation of noisy ica solutions,” inAdvances in Neural Information Processing Systems, 2024

work page 2024

[9] [9]

Fold: Efficient fourier-series-based score estimation for langevin diffusion,

S. Asokan, A. Srikanth, N. Shetty, and C. S. Seelamantula, “Fold: Efficient fourier-series-based score estimation for langevin diffusion,” 2024, extended abstract

work page 2024

[10] [10]

Application of the empirical characteristic function to compare and estimate densities by pooling information,

L. Ferr ´e and J. Whittaker, “Application of the empirical characteristic function to compare and estimate densities by pooling information,” Computational Statistics, vol. 19, no. 2, pp. 169–192, May 2004

work page 2004

[11] [11]

N. G. Ushakov,Selected topics in characteristic functions. Walter de Gruyter, 2011

work page 2011

[12] [12]

B. W. Silverman,Density estimation for statistics and data analysis. CRC press, 1986, vol. 26

work page 1986

[13] [13]

On the errors involved in computing the empirical characteristic function,

M. Jones and H. Lotwick, “On the errors involved in computing the empirical characteristic function,”Journal of Statistical Computation and Simulation, vol. 17, no. 2, pp. 133–149, 1983

work page 1983

[14] [14]

On the determination of probability distributions of more dimensions by their projections,

A. Heppes, “On the determination of probability distributions of more dimensions by their projections,”Acta Mathematica Hungarica, vol. 7, no. 3-4, pp. 403–410, 1956

work page 1956

[15] [15]

A new learning algorithm for blind signal separation,

S.-i. Amari, A. Cichocki, and H. Yang, “A new learning algorithm for blind signal separation,”Advances in neural information processing systems, vol. 8, 1995

work page 1995