Recognition: 3 theorem links
· Lean TheoremEviDep: Trustworthy Multimodal Depression Estimation via Disentangled Evidential Learning
Pith reviewed 2026-05-11 01:41 UTC · model grok-4.3
The pith
EviDep estimates depression severity from video and audio while also reporting aleatoric and epistemic uncertainty to reduce overconfident predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EviDep jointly quantifies depression severity alongside aleatoric and epistemic uncertainties via a Normal-Inverse-Gamma distribution. A Frequency-aware Feature Extraction module with wavelet-based Mixture-of-Experts decouples macro-level affective baselines from micro-level behavioral bursts to filter task-irrelevant artifacts. A Disentangled Evidential Learning strategy then decorrelates cross-modal shared consensus from modality-specific nuances before Bayesian fusion, strictly preventing double-counting of overlapping information.
What carries the argument
The Disentangled Evidential Learning strategy, which separates shared consensus features from modality-specific nuances before evidential fusion, paired with the wavelet-based Mixture-of-Experts in the Frequency-aware Feature Extraction module.
If this is right
- Provides risk-aware outputs that let downstream users weight high-uncertainty cases more cautiously.
- Reduces double-counting of redundant multimodal information, yielding better-calibrated confidence.
- Maintains state-of-the-art accuracy on AVEC 2013, AVEC 2014, DAIC-WOZ, and E-DAIC while adding uncertainty reporting.
- Handles temporal-frequency heterogeneity in behavioral cues without manual feature engineering.
Where Pith is reading between the lines
- The same separation of shared and specific signals could be tested on other multimodal health tasks where overlapping cues across sensors risk overcounting.
- If uncertainty tracks real clinical variability, the model could serve as a filter that flags cases needing human review before any automated recommendation.
- Extending the wavelet Mixture-of-Experts to additional modalities such as text transcripts would test whether the frequency-decoupling benefit generalizes.
Load-bearing premise
That the wavelet Mixture-of-Experts successfully removes artifacts without discarding depression-relevant signals and that explicit decorrelation of shared and specific features prevents confidence inflation without losing useful information.
What would settle it
Finding that uncertainty estimates do not rise for incorrect predictions on a new test split of the E-DAIC dataset, or that removing the disentanglement step leaves accuracy and calibration unchanged.
Figures
read the original abstract
Automated multimodal depression estimation in unconstrained environments is inherently challenged by naturalistic noise and complex behavioral variability. Prevailing deterministic methods, however, produce uncalibrated point estimates without quantifying predictive uncertainty, exposing decision-making to the risk of overconfident, untrustworthy estimates. To establish a reliable and trustworthy estimation paradigm, we propose EviDep, an evidential learning framework that jointly quantifies depression severity alongside aleatoric and epistemic uncertainties via a Normal-Inverse-Gamma distribution. To ensure the integrity of the extracted behavioral evidence and prevent artificial confidence inflation during multimodal fusion, EviDep introduces two tailored mechanisms. First, addressing the temporal-frequency heterogeneity of behavioral cues, a Frequency-aware Feature Extraction module leverages a wavelet-based Mixture-of-Experts to dynamically decouple stable macro-level affective baselines from transient micro-level behavioral bursts, effectively filtering out task-irrelevant artifacts. Second, a Disentangled Evidential Learning strategy enforces explicit decorrelation of features in these purified representations. By separating the cross-modal shared consensus from modality-specific behavioral nuances before Bayesian fusion, this rigorous disentanglement strictly prevents the model from double-counting overlapping information. Extensive experiments on the AVEC 2013, AVEC 2014, DAIC-WOZ, and E-DAIC datasets confirm that EviDep achieves state-of-the-art predictive accuracy and superior uncertainty calibration, thereby delivering a trustworthy, risk-aware decision-support tool for depression estimation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes EviDep, a multimodal evidential learning framework for depression severity estimation from behavioral cues. It models predictions via a Normal-Inverse-Gamma distribution to jointly output severity scores along with aleatoric and epistemic uncertainties. Two core mechanisms are introduced: a Frequency-aware Feature Extraction module that employs a wavelet-based Mixture-of-Experts to separate macro-level affective baselines from micro-level bursts, and a Disentangled Evidential Learning strategy that explicitly decorrelates shared cross-modal consensus from modality-specific features prior to Bayesian fusion. Experiments across AVEC 2013, AVEC 2014, DAIC-WOZ, and E-DAIC datasets are reported to achieve state-of-the-art accuracy while delivering superior uncertainty calibration.
Significance. If the empirical claims hold, the work offers a substantive contribution to trustworthy multimodal learning for mental health applications by moving beyond uncalibrated point estimates to risk-aware predictions. The integration of evidential deep learning with domain-specific disentanglement for temporal-frequency heterogeneity addresses a genuine practical need. The multi-dataset evaluation provides a reasonable empirical foundation, and the explicit focus on preventing overcounting in fusion is a clear methodological strength.
major comments (3)
- [§3.2] §3.2 (Frequency-aware Feature Extraction): The assertion that the wavelet-based Mixture-of-Experts successfully isolates task-irrelevant artifacts while preserving depression-relevant variance lacks supporting evidence such as expert activation maps, frequency-domain ablations, or quantitative comparison of retained signal variance before/after the module. Without these, the downstream claim of trustworthy uncertainty quantification cannot be fully evaluated.
- [§3.3] §3.3 and §4.3 (Disentangled Evidential Learning and ablations): The decorrelation loss is presented as strictly preventing double-counting of overlapping information, yet no ablation isolating its effect on both predictive accuracy (e.g., RMSE/MAE deltas) and calibration metrics (e.g., ECE or NLL) is reported. If the loss is too aggressive it could attenuate modality-specific severity cues, directly undermining the central trustworthiness argument.
- [Table 2] Table 2 / §4.2 (main results): The SOTA accuracy and calibration claims rest on the two unverified functional assumptions identified above. Direct comparisons against strong multimodal baselines with uncertainty heads (e.g., MC-dropout or deep ensembles) and explicit reporting of uncertainty calibration curves or sharpness metrics would be required to substantiate superiority.
minor comments (3)
- [§3.1] Notation for the Normal-Inverse-Gamma parameters (e.g., the four output heads) should be introduced with a single consolidated equation rather than scattered across subsections to improve readability.
- The manuscript would benefit from a dedicated limitations paragraph discussing potential failure modes when behavioral cues are sparse or when one modality is missing.
- [Figure 3] Figure 3 (architecture diagram) would be clearer with explicit arrows indicating the flow of the decorrelation loss and the final evidential fusion step.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We have carefully reviewed each major comment and provide point-by-point responses below. We agree that additional empirical evidence will strengthen the manuscript and will incorporate the requested analyses in the revised version.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Frequency-aware Feature Extraction): The assertion that the wavelet-based Mixture-of-Experts successfully isolates task-irrelevant artifacts while preserving depression-relevant variance lacks supporting evidence such as expert activation maps, frequency-domain ablations, or quantitative comparison of retained signal variance before/after the module. Without these, the downstream claim of trustworthy uncertainty quantification cannot be fully evaluated.
Authors: We acknowledge that the current description of the Frequency-aware Feature Extraction module would benefit from direct empirical validation of its separation capabilities. In the revised manuscript, we will include expert activation maps, frequency-domain ablation results, and quantitative comparisons of retained signal variance (pre- and post-module) to demonstrate that depression-relevant features are preserved while task-irrelevant artifacts are attenuated. revision: yes
-
Referee: [§3.3] §3.3 and §4.3 (Disentangled Evidential Learning and ablations): The decorrelation loss is presented as strictly preventing double-counting of overlapping information, yet no ablation isolating its effect on both predictive accuracy (e.g., RMSE/MAE deltas) and calibration metrics (e.g., ECE or NLL) is reported. If the loss is too aggressive it could attenuate modality-specific severity cues, directly undermining the central trustworthiness argument.
Authors: We agree that an isolated ablation of the decorrelation loss is necessary to fully substantiate its contribution. Although §4.3 contains related ablations, they do not isolate the loss's impact on both accuracy and calibration. In the revision, we will add a dedicated ablation table reporting RMSE, MAE, ECE, and NLL deltas with and without the decorrelation loss, confirming that it improves calibration without unduly suppressing modality-specific cues. revision: yes
-
Referee: [Table 2] Table 2 / §4.2 (main results): The SOTA accuracy and calibration claims rest on the two unverified functional assumptions identified above. Direct comparisons against strong multimodal baselines with uncertainty heads (e.g., MC-dropout or deep ensembles) and explicit reporting of uncertainty calibration curves or sharpness metrics would be required to substantiate superiority.
Authors: We appreciate the suggestion to benchmark against additional uncertainty-aware multimodal methods. To strengthen the empirical claims, the revised manuscript will include direct comparisons to strong baselines such as MC-dropout and deep ensembles applied to the same multimodal inputs. We will also report uncertainty calibration curves and sharpness metrics alongside the existing results to provide a more complete evaluation of calibration quality. revision: yes
Circularity Check
No circularity detected; derivation chain not inspectable via equations
full rationale
The visible abstract and framework description introduce modules (wavelet-based MoE for frequency-aware extraction and explicit decorrelation in disentangled evidential learning) but present no equations, parameter-fitting steps, or derivation chains that reduce outputs to inputs by construction. Claims of SOTA accuracy and calibration rest on experiments across external datasets (AVEC 2013/2014, DAIC-WOZ, E-DAIC), which constitute independent benchmarks rather than self-referential fits. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner within the provided text. This is the common case of a self-contained empirical proposal without mathematical circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Behavioral signals exhibit temporal-frequency heterogeneity that can be decoupled by wavelet-based Mixture-of-Experts into stable baselines and transient bursts.
- domain assumption Multimodal representations contain separable cross-modal consensus and modality-specific nuances that can be explicitly decorrelated before fusion.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Frequency-aware Feature Extraction module leverages a wavelet-based Mixture-of-Experts to dynamically decouple stable macro-level affective baselines from transient micro-level behavioral bursts
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Disentangled Evidential Learning strategy enforces explicit decorrelation of features... orthogonality and consistency constraints
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. Guohou, Z. Lina, and Z. Dongsong, “What reveals about depression level? the role of multimodal features at the level of interview questions,”Information and Management, vol. 57, no. 7, p. 103349, 2020. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0378720620302871
work page 2020
-
[2]
Deep learning for depression recognition with audiovisual cues: A review,
L. He, M. Niu, P. Tiwari, P. Marttinen, R. Su, J. Jiang, C. Guo, H. Wang, S. Ding, Z. Wang, X. Pan, and W. Dang, “Deep learning for depression recognition with audiovisual cues: A review,”Information Fusion, vol. 80, pp. 56–86, Apr. 2022
work page 2022
-
[3]
Toward a critical evaluation of robustness for deep learning backdoor countermeasures,
H. Qiu, H. Ma, Z. Zhang, A. Abuadbba, W. Kang, A. Fu, and Y . Gao, “Toward a critical evaluation of robustness for deep learning backdoor countermeasures,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 455–468, 2024
work page 2024
-
[4]
Second opinion needed: communicating uncertainty in medical machine learning,
B. Kompa, J. Snoek, and A. L. Beam, “Second opinion needed: communicating uncertainty in medical machine learning,”npj Digital Medicine, vol. 4, no. 1, p. 4, 2021
work page 2021
-
[5]
A. Lindenmeyer, M. Blattmann, S. Franke, T. Neumuth, and D. Schneider, “Towards trustworthy ai in healthcare: Epistemic uncertainty estimation for clinical decision support,”Journal of Personalized Medicine, vol. 15, no. 2, 2025. [Online]. Available: https://www.mdpi.com/2075-4426/15/2/58
work page 2025
-
[6]
Z. Sun, L. Liu, Z. Li, T. Wang, Z. Sui, N. Ruan, C. He, D. Lin, and J. Li, “Trustworthy dataset proof: Certifying the authentic use of dataset in training models for enhanced trust,”IEEE Transactions on Information Forensics and Security, vol. 21, pp. 1902–1913, 2026
work page 1902
-
[7]
Maximizing uncertainty for federated learning via bayesian optimization-based model poisoning,
M. Aristodemou, X. Liu, Y . Wang, K. G. Kyriakopoulos, S. Lamboth- aran, and Q. Wei, “Maximizing uncertainty for federated learning via bayesian optimization-based model poisoning,”IEEE Transactions on Information Forensics and Security, vol. 20, pp. 2399–2411, 2025
work page 2025
-
[8]
S. Li, Z. Shao, R. Qin, Y . Huang, P. Liang, X. Li, Y . Jiang, Y . Deng, T. Liu, and X. Tan, “Audio-visual feature disentanglement and fusion network for automatic depression severity prediction,”IEEE Transac- tions on Affective Computing, 2025
work page 2025
-
[9]
Mlm-eoe: Automatic depression detection via sentimental annotation and multi-expert ensem- ble,
Z. Lin, Y . Wang, Y . Zhou, F. Du, and Y . Yang, “Mlm-eoe: Automatic depression detection via sentimental annotation and multi-expert ensem- ble,”IEEE Transactions on Affective Computing, 2025
work page 2025
-
[10]
Conformal depression prediction,
Y . Li, S. Qu, and X. Zhou, “Conformal depression prediction,”IEEE Transactions on Affective Computing, vol. 16, no. 3, pp. 1814–1824, 2025
work page 2025
-
[11]
Fair Uncertainty Quantification for Depression Prediction,
Y . Li, Z. Zhang, and X. Zhou, “Fair Uncertainty Quantification for Depression Prediction,” Sep. 2025
work page 2025
-
[12]
U-fair: Uncertainty- based multimodal multitask learning for fairer depression detection,
J. Cheong, A. Bangar, S. Kalkan, and H. Gunes, “U-fair: Uncertainty- based multimodal multitask learning for fairer depression detection,” in Proceedings of the 4th Machine Learning for Health Symposium, ser. Proceedings of Machine Learning Research, S. Hegselmann, H. Zhou, E. Healey, T. Chang, C. Ellington, V . Mhasawade, S. Tonekaboni, P. Argaw, and H. ...
work page 2025
-
[13]
V . Panaite, J. Rottenberg, and L. M. Bylsma, “Daily affective dynamics predict depression symptom trajectories among adults with major and minor depression,”Affective Science, vol. 1, no. 3, pp. 186–198, 2020
work page 2020
-
[14]
Trustworthy multimodal regression with mixture of normal-inverse gamma distribu- tions,
H. Ma, Z. Han, C. Zhang, H. Fu, J. T. Zhou, and Q. Hu, “Trustworthy multimodal regression with mixture of normal-inverse gamma distribu- tions,”Advances in Neural Information Processing Systems, vol. 34, pp. 6881–6893, 2021
work page 2021
-
[15]
The evidence contraction issue in deep evidential regression: Discussion and solution,
Y . Wu, B. Shi, B. Dong, Q. Zheng, and H. Wei, “The evidence contraction issue in deep evidential regression: Discussion and solution,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 19, pp. 21 726–21 734, Mar. 2024. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/30172
work page 2024
-
[16]
Deep learning for depression recognition with audiovisual cues: A review,
L. He, M. Niu, P. Tiwari, P. Marttinen, R. Su, J. Jiang, C. Guo, H. Wang, S. Ding, Z. Wang, X. Pan, and W. Dang, “Deep learning for depression recognition with audiovisual cues: A review,” Information Fusion, vol. 80, pp. 56–86, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253521002207
work page 2022
-
[17]
Multimodal spatiotem- poral representation for automatic depression level detection,
M. Niu, J. Tao, B. Liu, J. Huang, and Z. Lian, “Multimodal spatiotem- poral representation for automatic depression level detection,”IEEE transactions on affective computing, vol. 14, no. 1, pp. 294–307, 2020
work page 2020
-
[18]
H. Fan, X. Zhang, Y . Xu, J. Fang, S. Zhang, X. Zhao, and J. Yu, “Transformer-based multimodal feature enhancement networks for mul- timodal depression detection integrating video, audio and remote photo- plethysmograph signals,”Information Fusion, vol. 104, p. 102161, 2024
work page 2024
-
[19]
Y . Pan, Y . Shang, Z. Shao, T. Liu, G. Guo, and H. Ding, “Integrating Deep Facial Priors Into Landmarks for Privacy Preserving Multimodal Depression Recognition,”IEEE Transactions on Affective Computing, vol. 15, no. 3, pp. 828–836, Jul. 2024
work page 2024
-
[20]
Depformer: A unified framework with bimodal collaborative transformer for depression detection,
F. Liu, S. Zhao, K. Yin, T. Xu, and E. Chen, “Depformer: A unified framework with bimodal collaborative transformer for depression detection,” inProceedings of the 33rd ACM International Conference on Multimedia, ser. MM ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 13930–13936. [Online]. Available: https://doi.org/10.1145/3746027.3762062
-
[21]
Automatic depression recognition with an ensemble of multimodal spatio-temporal routing features,
Y . Wang, Z. Lin, C. Yang, Y . Zhou, and Y . Yang, “Automatic depression recognition with an ensemble of multimodal spatio-temporal routing features,”IEEE Transactions on Affective Computing, 2025
work page 2025
-
[22]
Uncertainty-aware label contrastive distribution learning for automatic depression detection,
B. Yang, P. Wang, M. Cao, X. Zhu, S. Wang, R. Ni, and C. Yang, “Uncertainty-aware label contrastive distribution learning for automatic depression detection,”IEEE Transactions on Computational Social Systems, vol. 11, no. 2, pp. 2979–2989, 2024
work page 2024
-
[23]
A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep evidential regression,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 14 927– 14 937. [Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2020/file/aab085461de182...
work page 2020
-
[24]
A comprehensive survey on evidential deep learning and its applications,
J. Gao, M. Chen, L. Xiang, and C. Xu, “A comprehensive survey on evidential deep learning and its applications,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 3, pp. 2118– 2138, 2026
work page 2026
-
[25]
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic, “A VEC 2013: The Continu- JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13 ous Audio/Visual Emotion and Depression Recognition Challenge,” in Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013, pp. 3–10
work page 2013
-
[26]
A VEC 2014: 3D Dimensional Affect and De- pression Recognition Challenge,
M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic, “A VEC 2014: 3D Dimensional Affect and De- pression Recognition Challenge,” inProceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014, pp. 3–10
work page 2014
-
[27]
The distress analysis interview corpus of human and computer interviews,
J. Gratch, R. Artstein, G. Lucas, G. Stratou, S. Scherer, A. Nazarian, R. Wood, J. Boberg, D. DeVault, S. Marsella, D. Traum, S. Rizzo, and L.-P. Morency, “The distress analysis interview corpus of human and computer interviews,” inProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), N. Calzolari, K. Choukri, T...
work page 2014
-
[28]
F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, L. Tavabi, M. Schmitt, S. Alisamir, S. Amiriparian, E.-M. Messneret al., “A VEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition,” inProceedings of the 9th International Workshop on Audio/Visual Emotion Challenge and Workshop, 2019, pp. 3–12
work page 2019
-
[29]
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition,
Z. Zhu, G. Huang, J. Deng, G. Ye, J. Chen, J. Li, J. Tian, W. Du, X. Zhou, J. Liuet al., “WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10 492–10 502
work page 2021
-
[30]
S. Team, “Silero vad: pre-trained enterprise-grade voice activity detec- tor (vad), number detector and language classifier,” https://github.com/ snakers4/silero-vad, 2024
work page 2024
-
[31]
Multimodal spatiotem- poral representation for automatic depression level detection,
M. Niu, J. Tao, B. Liu, J. Huang, and Z. Lian, “Multimodal spatiotem- poral representation for automatic depression level detection,”IEEE Transactions on Affective Computing, vol. 14, no. 1, pp. 294–307, 2023
work page 2023
-
[32]
X. Chen, Z. Shao, Y . Jiang, R. Chen, Y . Wang, B. Li, M. Niu, H. Chen, Q. Hu, J. Wu, C. Yang, and Y . Shang, “Ttfnet: Temporal- frequency features fusion network for speech based automatic depression recognition and assessment,”IEEE Journal of Biomedical and Health Informatics, vol. 29, no. 10, pp. 7536–7548, 2025
work page 2025
-
[33]
Dense coordinate channel attention network for depression level estimation from speech,
Z. Zhao, S. Liu, M. Niu, H. Wang, and B. W. Schuller, “Dense coordinate channel attention network for depression level estimation from speech,” inInternational Conference on Pattern Recognition. Springer, 2024, pp. 402–413
work page 2024
-
[34]
M. Niu, B. Liu, J. Tao, and Q. Li, “A time-frequency channel attention and vectorization network for automatic depression level prediction,” Neurocomputing, vol. 450, pp. 208–218, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231221005981
work page 2021
-
[35]
Wavdepressionnet: Automatic depression level prediction via raw speech signals,
M. Niu, J. Tao, Y . Li, Y . Qin, and Y . Li, “Wavdepressionnet: Automatic depression level prediction via raw speech signals,”IEEE Transactions on Affective Computing, vol. 15, no. 1, pp. 285–296, 2023
work page 2023
-
[36]
A deep multiscale spa- tiotemporal network for assessing depression from facial dynamics,
W. C. De Melo, E. Granger, and A. Hadid, “A deep multiscale spa- tiotemporal network for assessing depression from facial dynamics,” IEEE transactions on affective computing, vol. 13, no. 3, pp. 1581– 1592, 2020
work page 2020
-
[37]
J. Xu, H. Gunes, K. Kusumam, M. Valstar, and S. Song, “Two-stage temporal modelling framework for video-based depression recognition using graph representation,”IEEE Transactions on Affective Computing, vol. 16, no. 1, pp. 161–178, 2025
work page 2025
-
[38]
L. He, Z. Li, P. Tiwari, C. Cao, J. Xue, F. Zhu, and D. Wu, “Depressformer: Leveraging video swin transformer and fine-grained local features for depression scale estimation,”Biomedical Signal Processing and Control, vol. 96, p. 106490, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1746809424005482
work page 2024
-
[39]
L. He, W. Yang, J. Zhao, H. Chen, and D. Jiang, “Feddaam: Feder- ated domain adversarial learning with attention mechanism for privacy preserving multimodal depression assessment,”IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[40]
Y . Pan, Y . Shang, Z. Shao, T. Liu, G. Guo, and H. Ding, “Integrating deep facial priors into landmarks for privacy preserving multimodal depression recognition,”IEEE Transactions on Affective Computing, vol. 15, no. 3, pp. 828–836, 2023
work page 2023
-
[41]
Y . Pan, J. Jiang, K. Jiang, and X. Liu, “Disentangled-multimodal privileged knowledge distillation for depression recognition with incom- plete multimodal data,” inProceedings of the 32nd ACM international conference on multimedia, 2024, pp. 5712–5721
work page 2024
-
[42]
Mfmamba: A multimodal fusion state space model for depression recognition,
J. Liu, Y . Shang, M. Yang, Z. Shao, J. Lu, and T. Liu, “Mfmamba: A multimodal fusion state space model for depression recognition,” in ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5
work page 2025
-
[43]
Deep multi-modal network based automated depression severity estimation,
M. A. Uddin, J. B. Joolee, and K.-A. Sohn, “Deep multi-modal network based automated depression severity estimation,”IEEE Transactions on Affective Computing, vol. 14, no. 3, pp. 2153–2167, 2023
work page 2023
-
[44]
Z. Lin, Y . Wang, Y . Zhou, F. Du, and Y . Yang, “Ste-mamba: Auto- mated multimodal depression detection through emotional analysis and spatio-temporal information ensemble,” inICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5
work page 2025
-
[45]
Spatial–temporal feature network for speech-based depression recog- nition,
Z. Han, Y . Shang, Z. Shao, J. Liu, G. Guo, T. Liu, H. Ding, and Q. Hu, “Spatial–temporal feature network for speech-based depression recog- nition,”IEEE Transactions on Cognitive and Developmental Systems, vol. 16, no. 1, pp. 308–318, 2024
work page 2024
-
[46]
Multitask representation learning for multimodal estimation of depression level,
S. A. Qureshi, S. Saha, M. Hasanuzzaman, and G. Dias, “Multitask representation learning for multimodal estimation of depression level,” IEEE Intelligent Systems, vol. 34, no. 5, pp. 45–52, 2019
work page 2019
-
[47]
Attention-guided bi-direction temporal-aware network for speech- based depression recognition,
J. Liu, Y . Shang, M. Yang, Z. Shao, H. Ding, and T. Liu, “Attention-guided bi-direction temporal-aware network for speech- based depression recognition,”Digital Signal Processing, vol. 166, p. 105359, 2025. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1051200425003811
work page 2025
-
[48]
Multi-feature deep supervised voiceprint adversarial network for depression recognition from speech,
Y . Pan, Y . Shang, W. Wang, Z. Shao, Z. Han, T. Liu, G. Guo, and H. Ding, “Multi-feature deep supervised voiceprint adversarial network for depression recognition from speech,”Biomedical Signal Processing and Control, vol. 89, p. 105704, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1746809423011370
work page 2024
-
[49]
Enhanced depression detection from facial cues using univariate feature selection techniques,
S. Rathi, B. Kaur, and R. K. Agrawal, “Enhanced depression detection from facial cues using univariate feature selection techniques,” inPattern Recognition and Machine Intelligence, B. Deka, P. Maji, S. Mitra, D. K. Bhattacharyya, P. K. Bora, and S. K. Pal, Eds. Cham: Springer International Publishing, 2019, pp. 22–29
work page 2019
-
[50]
A random forest regression method with selected-text feature for depression as- sessment,
B. Sun, Y . Zhang, J. He, L. Yu, Q. Xu, D. Li, and Z. Wang, “A random forest regression method with selected-text feature for depression as- sessment,” inProceedings of the 7th annual workshop on Audio/Visual emotion challenge, 2017, pp. 61–68
work page 2017
-
[51]
Facial action units guided graph representation learning for multimodal depression detection,
C. Fu, F. Qian, Y . Su, K. Su, S. Song, M. Niu, J. Shi, Z. Liu, C. Liu, C. T. Ishi, and H. Ishiguro, “Facial action units guided graph representation learning for multimodal depression detection,” Neurocomputing, vol. 619, p. 129106, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231224018770
work page 2025
-
[52]
Calibrated reliable regression using maximum mean discrepancy,
P. Cui, W. Hu, and J. Zhu, “Calibrated reliable regression using maximum mean discrepancy,” inAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., 2020. [Online]. Available: https...
work page 2020
-
[53]
J. W. Taylor, “A quantile regression neural network approach to es- timating the conditional density of multiperiod returns,”journal of forecasting, vol. 19, no. 4, pp. 299–311, 2000
work page 2000
-
[54]
Conformalized quantile regression,
Y . Romano, E. Patterson, and E. Candes, “Conformalized quantile regression,” inAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2019/file/5103c3584b063c4...
work page 2019
-
[55]
Conformal prediction using conditional histograms,
M. Sesia and Y . Romano, “Conformal prediction using conditional histograms,” inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 6304–
work page 2021
-
[56]
[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2021/file/31b3b31a1c2f8a370206f111127c0dbd-Paper.pdf
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.