Deep Multi-View Learning via Task-Optimal CCA

Charles M. Perou; Heather D. Couture; J.S. Marron; Marc Niethammer; Melissa Troester; Roland Kwitt

arxiv: 1907.07739 · v1 · pith:XS6X37TBnew · submitted 2019-07-17 · 💻 cs.LG · cs.CV· stat.ML

Deep Multi-View Learning via Task-Optimal CCA

Heather D. Couture , Roland Kwitt , J.S. Marron , Melissa Troester , Charles M. Perou , Marc Niethammer This is my paper

Pith reviewed 2026-05-24 20:17 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords multi-view learningcanonical correlation analysisdeep CCAsupervised CCAcross-view classificationsemi-supervised learning

0 comments

The pith

Simultaneously optimizing CCA correlation and a task objective in one deep network produces a shared latent space that is both highly correlated across views and more discriminative.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard CCA ignores class labels and that prior fixes either fail to optimize the projection and the downstream task jointly or remain linear. By training a deep network end-to-end on both the CCA correlation loss and a task loss, the method learns non-linear projections whose latent space supports better cross-view classification, view-based regularization, and semi-supervised learning. A sympathetic reader would care because multi-view data appear in many practical settings where one view is expensive or missing at test time.

Core claim

Simultaneously optimizing a CCA-based objective and a task objective in an end-to-end manner learns a non-linear CCA projection to a shared latent space that is highly correlated across views and discriminative for the task.

What carries the argument

Joint end-to-end optimization of the CCA correlation objective together with a task-specific objective inside a single deep network.

If this is right

The approach yields measurable gains in cross-view classification accuracy over prior state-of-the-art methods including deep supervised baselines.
The same joint objective supplies effective regularization when a second view is available at training time.
Performance improves in semi-supervised regimes where labels are scarce for one or both views.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint-training pattern could be applied to other multi-modal fusion problems that currently treat alignment and prediction as separate stages.
Stability of the combined loss could be tested by scaling the method to larger or noisier view pairs where correlation and discrimination objectives conflict more strongly.

Load-bearing premise

Jointly optimizing the two objectives will keep the latent space highly correlated while making it more discriminative, without the combined loss introducing instability or overfitting that cancels the gain.

What would settle it

A direct comparison on the same real multi-view datasets in which the jointly trained model fails to outperform either linear supervised CCA or a deep network that trains the CCA projection and the task head separately.

Figures

Figures reproduced from arXiv: 1907.07739 by Charles M. Perou, Heather D. Couture, J.S. Marron, Marc Niethammer, Melissa Troester, Roland Kwitt.

**Figure 1.** Figure 1: Deep CCA architectures: (a) DCCA maximizes the sum correlation in projection space by optimizing an equivalent loss, the trace norm objective (TNO) [3]; (b) SoftCCA relaxes the orthogonality constraints by regularizing with soft decorrelation (Decorr) and optimizes the `2 distance in the projection space (equivalent to sum correlation with activations normalized to unit variance) [8]. Our TOCCA methods add… view at source ↗

**Figure 2.** Figure 2: Left: Sum correlation vs. cross-view classification accuracy (on MNIST) across different hyperparameter settings on a training set size of 10,000 for DCCA [3], SoftCCA [8], TOCCA-W, and TOCCA-SD. For unsupervised methods (DCCA and SoftCCA), large correlations do not necessarily imply good accuracy. Right: The effect of batch size on classification accuracy for each TOCCA method on MNIST (training set size… view at source ↗

**Figure 3.** Figure 3: t-SNE plots for CCA methods on our variation of MNIST. Each method was used to compute projections for the two views (left and right sides of the images) using 10,000 training examples. The plots show a visualization of the projection for the left view with each digit colored differently. TOCCA-SD and TOCCA-ND (not shown) produced similar results to TOCCA-W [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Canonical Correlation Analysis (CCA) is widely used for multimodal data analysis and, more recently, for discriminative tasks such as multi-view learning; however, it makes no use of class labels. Recent CCA methods have started to address this weakness but are limited in that they do not simultaneously optimize the CCA projection for discrimination and the CCA projection itself, or they are linear only. We address these deficiencies by simultaneously optimizing a CCA-based and a task objective in an end-to-end manner. Together, these two objectives learn a non-linear CCA projection to a shared latent space that is highly correlated and discriminative. Our method shows a significant improvement over previous state-of-the-art (including deep supervised approaches) for cross-view classification, regularization with a second view, and semi-supervised learning on real data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Task-Optimal CCA, a deep multi-view learning approach that jointly optimizes a CCA correlation objective together with a task-specific loss inside a single end-to-end network. This produces a non-linear projection onto a shared latent space that is both highly correlated across views and more discriminative than those obtained by separately trained or purely supervised baselines. Real-data experiments are reported to show significant gains over prior state-of-the-art (including deep supervised methods) on cross-view classification, view-regularized supervised learning, and semi-supervised settings.

Significance. If the empirical gains are reproducible and not artifacts of hyper-parameter tuning or dataset choice, the work would constitute a useful practical advance in multi-view representation learning by closing the gap between correlation-maximizing and task-driven objectives inside the same differentiable pipeline. The method directly tests whether end-to-end joint optimization can simultaneously satisfy the two desiderata that standard CCA and earlier deep CCA variants address only sequentially.

major comments (2)

The central empirical claim rests on the joint loss producing a latent space that remains both correlated and task-discriminative; the manuscript should report the sensitivity of the reported gains to the relative weighting of the CCA and task terms (a free parameter listed in the axiom ledger) and include an ablation that isolates the contribution of each term.
Because the method is evaluated on cross-view classification, regularization with a second view, and semi-supervised regimes, the experimental section must clarify whether the same network architecture, loss weighting schedule, and optimization protocol are used across all three settings or whether task-specific tuning undermines the claim of a single unified approach.

minor comments (2)

Notation for the combined objective should be introduced once with explicit symbols for the CCA term, the task term, and the weighting hyper-parameter rather than being redefined inline in each experimental subsection.
The abstract states 'significant improvement' without numerical deltas; the results section should include a table that directly compares the proposed method against the cited baselines on each task with the same metrics and error bars.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation. We address the two major comments below and will revise the manuscript accordingly to strengthen the empirical presentation.

read point-by-point responses

Referee: The central empirical claim rests on the joint loss producing a latent space that remains both correlated and task-discriminative; the manuscript should report the sensitivity of the reported gains to the relative weighting of the CCA and task terms (a free parameter listed in the axiom ledger) and include an ablation that isolates the contribution of each term.

Authors: We agree that sensitivity analysis and ablation are valuable for validating the joint objective. In the revised manuscript we will add (i) an ablation isolating the CCA term versus the task term and (ii) results across a range of weighting values for the CCA loss to demonstrate that the reported gains are not artifacts of a single hyper-parameter choice. revision: yes
Referee: Because the method is evaluated on cross-view classification, regularization with a second view, and semi-supervised regimes, the experimental section must clarify whether the same network architecture, loss weighting schedule, and optimization protocol are used across all three settings or whether task-specific tuning undermines the claim of a single unified approach.

Authors: The same network architecture, loss weighting schedule, and optimization protocol are used in all three regimes; only the supervision signal (labels or lack thereof) changes according to the task. We will add an explicit statement and a summary table in the experimental section of the revision to remove any ambiguity about the unified protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical method for jointly optimizing a CCA correlation objective and a task-specific loss in an end-to-end deep network to produce a shared latent space. All central claims concern measurable performance gains on real-data tasks (cross-view classification, view regularization, semi-supervised learning) relative to baselines; these are validated experimentally rather than derived from a closed mathematical chain. No equations reduce a claimed prediction to a fitted parameter by construction, no uniqueness theorems are imported via self-citation, and no ansatz or renaming of known results is presented as a derivation. The work is therefore self-contained as an empirical demonstration.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only review; implementation details, network architectures, loss weighting, and optimization choices are unknown, so the ledger is necessarily incomplete.

free parameters (2)

network depth and width
Deep non-linear CCA requires choosing architecture hyperparameters that are fitted or selected on data.
loss weighting between CCA and task terms
Joint optimization requires a scalar that balances the two objectives and is chosen to achieve reported performance.

axioms (1)

domain assumption Paired multi-view samples with class labels are available for training
The method is described for supervised and semi-supervised multi-view settings that presuppose such paired labeled data.

pith-pipeline@v0.9.0 · 5680 in / 1211 out tokens · 19251 ms · 2026-05-24T20:17:47.856068+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 6 internal anchors

[1]

Relations between two sets of variates

Harold Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321–377, dec 1936

work page 1936
[2]

Eigenproblems in pattern recognition

Tijl De Bie, Nello Cristianini, and Roman Rosipal. Eigenproblems in pattern recognition. In Handbook of Geometric Computing, pages 129–167. Springer Berlin Heidelberg, 2005

work page 2005
[3]

Deep Canonical Correlation Analysis

Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. Deep Canonical Correlation Analysis. In Proc. ICML, 2013

work page 2013
[4]

On deep multi-view representation learning

Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning. In Proc. ICML, 2015

work page 2015
[5]

Stochastic optimization for deep CCA via nonlinear orthogonal iterations

Weiran Wang, Raman Arora, Karen Livescu, and Nathan Srebro. Stochastic optimization for deep CCA via nonlinear orthogonal iterations. In Proc. Allerton Conference on Communication, Control, and Computing , 2016

work page 2016
[6]

Multi-view Discriminant Analysis

Meina Kan, Shiguang Shan, Haihong Zhang, Shihong Lao, and Xilin Chen. Multi-view Discriminant Analysis. IEEE PAMI, 2015

work page 2015
[7]

Khapra, Hugo Larochelle, and Balaraman Ravindran

Sarath Chandar, Mitesh M. Khapra, Hugo Larochelle, and Balaraman Ravindran. Correlational Neural Networks. Neural Computation, 28(2):257–285, feb 2016

work page 2016
[8]

Hospedales

Xiaobin Chang, Tao Xiang, and Timothy M. Hospedales. Scalable and Effective Deep CCA via Soft Decorrelation. In Proc. CVPR, 2018

work page 2018
[9]

Audiovisual synchronization and fusion using canonical correlation analysis

Mehmet Emre Sargin, Yücel Yemez, Engin Erzin, and A Murat Tekalp. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Transactions on Multimedia, 9(7):1396–1403, 2007

work page 2007
[10]

Towards Deep and Discriminative Canonical Correlation Analysis

Matthias Dorfer, Gerhard Widmer, and Gerhard Widmerajku At. Towards Deep and Discriminative Canonical Correlation Analysis. In Proc. ICML Workshop on Multi-view Representaiton Learning, 2016

work page 2016
[11]

Kernel cca for multi-view learning of acoustic features using articulatory measurements

Raman Arora and Karen Livescu. Kernel cca for multi-view learning of acoustic features using articulatory measurements. In Symposium on Machine Learning in Speech and Language Processing , 2012

work page 2012
[12]

Supervised multi-view canonical correlation analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer

George Lee, Asha Singanamalli, Haibo Wang, Michael D Feldman, Stephen R Master, Natalie N C Shih, Elaine Spangler, Timothy Rebbeck, John E Tomaszewski, and Anant Madabhushi. Supervised multi-view canonical correlation analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer. IEEE Transactions on Medical Ima...

work page 2015
[13]

Supervised multi-view canonical correla- tion analysis: fused multimodal prediction of disease diagnosis and prognosis

Asha Singanamalli, Haibo Wang, George Lee, Natalie Shih, Mark Rosen, Stephen Master, John Tomaszewski, Michael Feldman, and Anant Madabhushi. Supervised multi-view canonical correla- tion analysis: fused multimodal prediction of disease diagnosis and prognosis. In Proc. SPIE Medical Imaging, 2014

work page 2014
[14]

Joint learning of cross-modal classiﬁer and factor analysis for multimedia data classiﬁcation

Kanghong Duan, Hongxin Zhang, and Jim Jing Yan Wang. Joint learning of cross-modal classiﬁer and factor analysis for multimedia data classiﬁcation. Neural Computing and Applications , 27(2):459–468, feb 2016

work page 2016
[15]

End-to-end cross- modality retrieval with CCA projections and pairwise ranking loss

Matthias Dorfer, Jan Schlüter, Andreu Vall, Filip Korzeniowski, and Gerhard Widmer. End-to-end cross- modality retrieval with CCA projections and pairwise ranking loss. International Journal of Multimedia Information Retrieval, 7(2):117–128, jun 2018

work page 2018
[16]

Joint sparse representation for robust multimodal biometrics recognition

Sumit Shekhar, Vishal M Patel, Nasser M Nasrabadi, and Rama Chellappa. Joint sparse representation for robust multimodal biometrics recognition. IEEE PAMI, 36(1):113–26, jan 2014

work page 2014
[17]

Coupled dictionary learning and feature mapping for cross-modal retrieval

Xing Xu, Atsushi Shimada, Rin-ichiro Taniguchi, and Li He. Coupled dictionary learning and feature mapping for cross-modal retrieval. In Proc. International Conference on Multimedia and Expo , 2015

work page 2015
[18]

Miriam Cha, Youngjune Gwon, and H. T. Kung. Multimodal sparse representation learning and applications. arXiv preprint: 1511.06238, 2015. 9

work page internal anchor Pith review Pith/arXiv arXiv 2015
[19]

Multimodal Task-Driven Dictionary Learning for Image Classification

Soheil Bahrampour, Nasser M. Nasrabadi, Asok Ray, and W. Kenneth Jenkins. Multimodal Task-Driven Dictionary Learning for Image Classiﬁcation. arXiv preprint: 1502.01094, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Common Representation Learning Using Step-based Correlation Multi-Modal CNN

Gaurav Bhatt, Piyush Jha, and Balasubramanian Raman. Common Representation Learning Using Step- based Correlation Multi-Modal CNN. arXiv preprint: 1711.00003, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Decorrelated Batch Normalization

Lei Huang, Dawei Yang, Bo Lang, and Jia Deng. Decorrelated Batch Normalization. In Proc. CVPR, 2018

work page 2018
[22]

The mnist database of handwritten digits

Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998

work page 1998
[23]

Bilenko and Jack L

Natalia Y . Bilenko and Jack L. Gallant. Pyrcca: regularized kernel canonical correlation analysis in Python and its applications to neuroimaging. Frontiers in Neuroinformatics, 10, nov 2016

work page 2016
[24]

Dongge Li, Nevenka Dimitrova, Mingkun Li, and Ishwar K. Sethi. Multimedia content processing through cross-modal association. In Proc. ACM International Conference on Multimedia , 2003

work page 2003
[25]

Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters and Carlo Luschi. Revisiting Small Batch Training for Deep Neural Networks. arxiv preprint: 1804.07612, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. ICML, 2015

work page 2015
[27]

Imagenet classiﬁcation with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in neural information processing systems , pages 1106–1114, 2012

work page 2012
[28]

Deep linear discriminant analysis

Matthias Dorfer, Rainer Kelz, and Gerhard Widmer. Deep linear discriminant analysis. In Proc. ICLR, 2016

work page 2016
[29]

DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural Network

Jared Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. Deep Survival: A Deep Cox Proportional Hazards Network. arxiv preprint: 1606.00931, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[30]

Deep clustering for unsupervised learning of visual features

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In Proc. ECCV, 2018

work page 2018
[31]

Optimal whitening and decorrelation

Agnan Kessy, Alex Lewin, and Korbinian Strimmer. Optimal whitening and decorrelation. arXiv preprint: 1512.00809, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[32]

Visualizing high-dimensional data using t-sne

L Van Der Maaten and G Hinton. Visualizing high-dimensional data using t-sne. journal of machine learning research. Journal of Machine Learning Research, 9:26, 2008

work page 2008
[33]

Allott, Joseph Geradts, Stephanie M Cohen, Chui Kit Tse, Erin L

MA Troester, Xuezheng Sun, Emma H. Allott, Joseph Geradts, Stephanie M Cohen, Chui Kit Tse, Erin L. Kirk, Leigh B Thorne, Michelle Matthews, Yan Li, Zhiyuan Hu, Whitney R. Robinson, Katherine A. Hoadley, Olufunmilayo I. Olopade, Katherine E. Reeder-Hayes, H. Shelton Earp, Andrew F. Olshan, LA Carey, and Charles M. Perou. Racial differences in PAM50 subtyp...

work page 2018
[34]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proc. ICLR, 2015

work page 2015
[35]

Supervised risk predictor of breast cancer based on intrinsic subtypes

Joel S Parker, Michael Mullins, Maggie CU Cheang, Samuel Leung, David V oduc, Tammi Vickery, Sherri Davies, Christiane Fauron, Xiaping He, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology, 27(8):1160–1167, 2009

work page 2009
[36]

Weiran Wang, Raman Arora, Karen Livescu, and Jeff A. Bilmes. Unsupervised learning of acoustic features via deep canonical correlation analysis. In Proc. ICASSP, 2015

work page 2015
[37]

Joint and Individual Variation Explained (JIVE) for Integrated Analysis of Multiple Data Types

Eric F Lock, Katherine A Hoadley, J S Marron, and Andrew B Nobel. Joint and Individual Variation Explained (JIVE) for Integrated Analysis of Multiple Data Types. The Annals of Applied Statistics , 7(1):523–542, mar 2013

work page 2013
[38]

Bayesian joint analysis of heterogeneous genomics data

Priyadip Ray, Lingling Zheng, Joseph Lucas, and Lawrence Carin. Bayesian joint analysis of heterogeneous genomics data. Bioinformatics, 30(10):1370–6, may 2014

work page 2014
[39]

Angle-based joint and individual variation explained

Qing Feng, Meilei Jiang, Jan Hannig, and JS Marron. Angle-based joint and individual variation explained. Journal of Multivariate Analysis, 166:241–265, 2018. 10 Supplementary Material This supplementary material includes additional details on our TOCCA algorithm and experiments, including 1) a comparison of our formulation with other related CCA approach...

work page 2018

[1] [1]

Relations between two sets of variates

Harold Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321–377, dec 1936

work page 1936

[2] [2]

Eigenproblems in pattern recognition

Tijl De Bie, Nello Cristianini, and Roman Rosipal. Eigenproblems in pattern recognition. In Handbook of Geometric Computing, pages 129–167. Springer Berlin Heidelberg, 2005

work page 2005

[3] [3]

Deep Canonical Correlation Analysis

Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. Deep Canonical Correlation Analysis. In Proc. ICML, 2013

work page 2013

[4] [4]

On deep multi-view representation learning

Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning. In Proc. ICML, 2015

work page 2015

[5] [5]

Stochastic optimization for deep CCA via nonlinear orthogonal iterations

Weiran Wang, Raman Arora, Karen Livescu, and Nathan Srebro. Stochastic optimization for deep CCA via nonlinear orthogonal iterations. In Proc. Allerton Conference on Communication, Control, and Computing , 2016

work page 2016

[6] [6]

Multi-view Discriminant Analysis

Meina Kan, Shiguang Shan, Haihong Zhang, Shihong Lao, and Xilin Chen. Multi-view Discriminant Analysis. IEEE PAMI, 2015

work page 2015

[7] [7]

Khapra, Hugo Larochelle, and Balaraman Ravindran

Sarath Chandar, Mitesh M. Khapra, Hugo Larochelle, and Balaraman Ravindran. Correlational Neural Networks. Neural Computation, 28(2):257–285, feb 2016

work page 2016

[8] [8]

Hospedales

Xiaobin Chang, Tao Xiang, and Timothy M. Hospedales. Scalable and Effective Deep CCA via Soft Decorrelation. In Proc. CVPR, 2018

work page 2018

[9] [9]

Audiovisual synchronization and fusion using canonical correlation analysis

Mehmet Emre Sargin, Yücel Yemez, Engin Erzin, and A Murat Tekalp. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Transactions on Multimedia, 9(7):1396–1403, 2007

work page 2007

[10] [10]

Towards Deep and Discriminative Canonical Correlation Analysis

Matthias Dorfer, Gerhard Widmer, and Gerhard Widmerajku At. Towards Deep and Discriminative Canonical Correlation Analysis. In Proc. ICML Workshop on Multi-view Representaiton Learning, 2016

work page 2016

[11] [11]

Kernel cca for multi-view learning of acoustic features using articulatory measurements

Raman Arora and Karen Livescu. Kernel cca for multi-view learning of acoustic features using articulatory measurements. In Symposium on Machine Learning in Speech and Language Processing , 2012

work page 2012

[12] [12]

Supervised multi-view canonical correlation analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer

George Lee, Asha Singanamalli, Haibo Wang, Michael D Feldman, Stephen R Master, Natalie N C Shih, Elaine Spangler, Timothy Rebbeck, John E Tomaszewski, and Anant Madabhushi. Supervised multi-view canonical correlation analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer. IEEE Transactions on Medical Ima...

work page 2015

[13] [13]

Supervised multi-view canonical correla- tion analysis: fused multimodal prediction of disease diagnosis and prognosis

Asha Singanamalli, Haibo Wang, George Lee, Natalie Shih, Mark Rosen, Stephen Master, John Tomaszewski, Michael Feldman, and Anant Madabhushi. Supervised multi-view canonical correla- tion analysis: fused multimodal prediction of disease diagnosis and prognosis. In Proc. SPIE Medical Imaging, 2014

work page 2014

[14] [14]

Joint learning of cross-modal classiﬁer and factor analysis for multimedia data classiﬁcation

Kanghong Duan, Hongxin Zhang, and Jim Jing Yan Wang. Joint learning of cross-modal classiﬁer and factor analysis for multimedia data classiﬁcation. Neural Computing and Applications , 27(2):459–468, feb 2016

work page 2016

[15] [15]

End-to-end cross- modality retrieval with CCA projections and pairwise ranking loss

Matthias Dorfer, Jan Schlüter, Andreu Vall, Filip Korzeniowski, and Gerhard Widmer. End-to-end cross- modality retrieval with CCA projections and pairwise ranking loss. International Journal of Multimedia Information Retrieval, 7(2):117–128, jun 2018

work page 2018

[16] [16]

Joint sparse representation for robust multimodal biometrics recognition

Sumit Shekhar, Vishal M Patel, Nasser M Nasrabadi, and Rama Chellappa. Joint sparse representation for robust multimodal biometrics recognition. IEEE PAMI, 36(1):113–26, jan 2014

work page 2014

[17] [17]

Coupled dictionary learning and feature mapping for cross-modal retrieval

Xing Xu, Atsushi Shimada, Rin-ichiro Taniguchi, and Li He. Coupled dictionary learning and feature mapping for cross-modal retrieval. In Proc. International Conference on Multimedia and Expo , 2015

work page 2015

[18] [18]

Miriam Cha, Youngjune Gwon, and H. T. Kung. Multimodal sparse representation learning and applications. arXiv preprint: 1511.06238, 2015. 9

work page internal anchor Pith review Pith/arXiv arXiv 2015

[19] [19]

Multimodal Task-Driven Dictionary Learning for Image Classification

Soheil Bahrampour, Nasser M. Nasrabadi, Asok Ray, and W. Kenneth Jenkins. Multimodal Task-Driven Dictionary Learning for Image Classiﬁcation. arXiv preprint: 1502.01094, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[20] [20]

Common Representation Learning Using Step-based Correlation Multi-Modal CNN

Gaurav Bhatt, Piyush Jha, and Balasubramanian Raman. Common Representation Learning Using Step- based Correlation Multi-Modal CNN. arXiv preprint: 1711.00003, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Decorrelated Batch Normalization

Lei Huang, Dawei Yang, Bo Lang, and Jia Deng. Decorrelated Batch Normalization. In Proc. CVPR, 2018

work page 2018

[22] [22]

The mnist database of handwritten digits

Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998

work page 1998

[23] [23]

Bilenko and Jack L

Natalia Y . Bilenko and Jack L. Gallant. Pyrcca: regularized kernel canonical correlation analysis in Python and its applications to neuroimaging. Frontiers in Neuroinformatics, 10, nov 2016

work page 2016

[24] [24]

Dongge Li, Nevenka Dimitrova, Mingkun Li, and Ishwar K. Sethi. Multimedia content processing through cross-modal association. In Proc. ACM International Conference on Multimedia , 2003

work page 2003

[25] [25]

Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters and Carlo Luschi. Revisiting Small Batch Training for Deep Neural Networks. arxiv preprint: 1804.07612, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. ICML, 2015

work page 2015

[27] [27]

Imagenet classiﬁcation with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in neural information processing systems , pages 1106–1114, 2012

work page 2012

[28] [28]

Deep linear discriminant analysis

Matthias Dorfer, Rainer Kelz, and Gerhard Widmer. Deep linear discriminant analysis. In Proc. ICLR, 2016

work page 2016

[29] [29]

DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural Network

Jared Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. Deep Survival: A Deep Cox Proportional Hazards Network. arxiv preprint: 1606.00931, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[30] [30]

Deep clustering for unsupervised learning of visual features

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In Proc. ECCV, 2018

work page 2018

[31] [31]

Optimal whitening and decorrelation

Agnan Kessy, Alex Lewin, and Korbinian Strimmer. Optimal whitening and decorrelation. arXiv preprint: 1512.00809, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[32] [32]

Visualizing high-dimensional data using t-sne

L Van Der Maaten and G Hinton. Visualizing high-dimensional data using t-sne. journal of machine learning research. Journal of Machine Learning Research, 9:26, 2008

work page 2008

[33] [33]

Allott, Joseph Geradts, Stephanie M Cohen, Chui Kit Tse, Erin L

MA Troester, Xuezheng Sun, Emma H. Allott, Joseph Geradts, Stephanie M Cohen, Chui Kit Tse, Erin L. Kirk, Leigh B Thorne, Michelle Matthews, Yan Li, Zhiyuan Hu, Whitney R. Robinson, Katherine A. Hoadley, Olufunmilayo I. Olopade, Katherine E. Reeder-Hayes, H. Shelton Earp, Andrew F. Olshan, LA Carey, and Charles M. Perou. Racial differences in PAM50 subtyp...

work page 2018

[34] [34]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proc. ICLR, 2015

work page 2015

[35] [35]

Supervised risk predictor of breast cancer based on intrinsic subtypes

Joel S Parker, Michael Mullins, Maggie CU Cheang, Samuel Leung, David V oduc, Tammi Vickery, Sherri Davies, Christiane Fauron, Xiaping He, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology, 27(8):1160–1167, 2009

work page 2009

[36] [36]

Weiran Wang, Raman Arora, Karen Livescu, and Jeff A. Bilmes. Unsupervised learning of acoustic features via deep canonical correlation analysis. In Proc. ICASSP, 2015

work page 2015

[37] [37]

Joint and Individual Variation Explained (JIVE) for Integrated Analysis of Multiple Data Types

Eric F Lock, Katherine A Hoadley, J S Marron, and Andrew B Nobel. Joint and Individual Variation Explained (JIVE) for Integrated Analysis of Multiple Data Types. The Annals of Applied Statistics , 7(1):523–542, mar 2013

work page 2013

[38] [38]

Bayesian joint analysis of heterogeneous genomics data

Priyadip Ray, Lingling Zheng, Joseph Lucas, and Lawrence Carin. Bayesian joint analysis of heterogeneous genomics data. Bioinformatics, 30(10):1370–6, may 2014

work page 2014

[39] [39]

Angle-based joint and individual variation explained

Qing Feng, Meilei Jiang, Jan Hannig, and JS Marron. Angle-based joint and individual variation explained. Journal of Multivariate Analysis, 166:241–265, 2018. 10 Supplementary Material This supplementary material includes additional details on our TOCCA algorithm and experiments, including 1) a comparison of our formulation with other related CCA approach...

work page 2018