arxiv: 2605.06303 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: unknown

Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces

Zakaria Elabid , Jan Andrzejewski , Bartosz Brzoza , Attila Cangi

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:01 UTC · model grok-4.3

classification 💻 cs.LG

keywords Transformer-VAESELFIESlatent space steeringconfound-aware evaluationmolecular generative modelsRDKit descriptorsproperty prediction

0 comments

The pith

Linear probes on Transformer-VAE latent spaces yield steerable directions for chemical properties in molecules after accounting for SELFIES artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Molecular generative models like Transformer-VAEs on SELFIES can produce latent spaces where chemical properties appear predictable, but this may come from sequence shortcuts instead of real chemistry. The paper trains such a model unsupervised, then fits linear probes to RDKit chemical descriptors to find potential steering vectors in the latent space. To check if these are genuine, they develop a confound-aware method using residualization against token-based artifacts, alignment checks, and verifying changes in actually decoded molecules. This approach identifies reliable steering for several properties including cLogP and heavy atom count. The findings indicate that meaningful chemical control can arise in these entangled representations when carefully validated.

Core claim

In an unsupervised autoregressive Transformer-VAE trained on SELFIES strings, the latent space encodes RDKit chemical descriptors in directions that permit monotonic steering, but this signal must be separated from strong encoding of SELFIES-specific features such as length, branch tokens, ring tokens, and token entropy through residualization and decoded-molecule traversal to confirm its chemical validity.

What carries the argument

The confound-aware evaluation procedure consisting of residualization, confound-direction alignment analysis, and decoded-molecule traversal that isolates chemical property signals from representation artifacts in the latent space.

Load-bearing premise

The chosen confounds of SELFIES length, branch tokens, ring tokens, and token entropy, combined with residualization and decoded-molecule traversal, are sufficient to fully isolate chemical signals from all representation artifacts.

What would settle it

Observing that steered latent directions no longer produce the expected monotonic changes in the target RDKit properties when the decoded molecules are examined after residualization against the confounds.

Figures

Figures reproduced from arXiv: 2605.06303 by Attila Cangi, Bartosz Brzoza, Jan Andrzejewski, Zakaria Elabid.

**Figure 1.** Figure 1: Overview of the proposed framework. A. SMILES are converted to SELFIES, tokenized, and used to train an autoregressive Transformer-VAE. B. The encoder is frozen and the latent space is probed for molecular properties Y and SELFIES-level confounds C; residualization removes the component predictable from the confounds. C. Raw and residual R2 values, together with decodedmolecule traversals, separate linear… view at source ↗

**Figure 2.** Figure 2: Latent traversals for cLogP, FractionCSP3, TPSA, and HBA reporting the median trajectory view at source ↗

**Figure 3.** Figure 3: Interpolation continuity for the current model. The curve reports the median adjacent view at source ↗

**Figure 4.** Figure 4: Functional-family retention along family-conditioned interpolation paths. Bars report the view at source ↗

**Figure 5.** Figure 5: Bootstrap stability of probe directions. Bars report the median cosine similarity between view at source ↗

**Figure 6.** Figure 6: Control analyses. Left: permutation control, showing that test-set R2 becomes approximately null when the training labels are permuted. Right: random-direction control, comparing the observed maximum absolute cosine similarity to confound directions against a null distribution from random latent directions. D Additional confound analysis To complement the main results, we report additional analyses assess… view at source ↗

**Figure 7.** Figure 7: Correlation between molecular properties and SELFIES-derived confounds. Pearson and view at source ↗

**Figure 8.** Figure 8: Cosine similarity between property directions and confound directions in latent space. view at source ↗

**Figure 9.** Figure 9: Additional monotonic latent traversals from Section 4.5. BertzCT and HeavyAtomCount view at source ↗

**Figure 10.** Figure 10: Inter-property structure in descriptor space and latent-direction space. Left: empirical view at source ↗

**Figure 11.** Figure 11: Raw, confound, residual, and random-axis latent traversals for the linear-attention model. view at source ↗

**Figure 12.** Figure 12: Raw-axis latent traversals for the simple-attention model. The simple-attention baseline view at source ↗

read the original abstract

Molecular generative models often assume meaningful latent geometry, but apparent property predictability can reflect sequence-level shortcuts rather than chemical organization. We study this issue in an unsupervised autoregressive Transformer-VAE trained on SELFIES. After training, we freeze the model, fit linear probes to RDKit descriptors, and use the probe weights as candidate global steering directions. To separate chemical signal from SELFIES artifacts, we introduce a confound-aware evaluation based on residualization, confound-direction alignment analysis, and decoded-molecule traversal. This is necessary because SELFIES length, branch tokens, ring tokens, and token entropy are strongly encoded in the latent space. Under this confound-aware evaluation, we find robust monotonic steering for cLogP, FractionCSP3, HeavyAtomCount, TPSA, BertzCT, and HBA. Nonlinear probes further show that some properties admit stable global directions, while others are better described by local latent gradients. Overall, our results show that chemically meaningful steering can emerge in entangled molecular latent spaces, but only when validated through decoded molecules and controlled for representation-level confounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical pipeline for checking whether steering directions in a molecular Transformer-VAE are chemical or just SELFIES string artifacts, but the controls may still miss some token patterns.

read the letter

The main thing to know is that this work tests whether latent directions in an unsupervised Transformer-VAE on SELFIES can steer real chemical properties once you subtract out obvious sequence confounds. They freeze the trained model, fit linear probes to RDKit descriptors like cLogP and TPSA, residualize the probe weights against SELFIES length, branch and ring token counts, and token entropy, then check alignment and decode molecules along the direction to verify monotonic change. This combination of residualization plus decoded traversal is the concrete new piece beyond standard disentanglement checks. It does a decent job showing that some properties still steer cleanly after the controls, which directly tackles the shortcut problem that often gets ignored in these models. The nonlinear probe results are also a small plus, indicating that not every property needs a single global direction. The soft spot is the limited set of confounds. Residualizing on length and token entropy leaves room for other unmeasured SELFIES regularities, such as recurring heteroatom motifs or functional-group token patterns, that could still line up with the target properties. Without seeing ablations that add more controls or quantify how much the steering weakens, it is hard to know if the reported monotonic behavior is fully isolated from representation artifacts. The abstract also gives no numbers on effect sizes, sample counts, or variance, so the strength of the evidence is difficult to judge from the summary. This is aimed at researchers who build or apply molecular generative models and want more reliable latent-space optimization. A reader working on interpretability in chemistry ML would find the evaluation steps useful even if they disagree with how complete the controls are. It deserves peer review because the validation approach is a step forward that could become a standard check, though the authors should be asked to expand the confound tests and report the actual steering statistics.

Referee Report

2 major / 3 minor

Summary. The manuscript trains an unsupervised autoregressive Transformer-VAE on SELFIES molecular strings, freezes the encoder, and fits linear probes to RDKit descriptors to derive candidate global steering vectors in the latent space. It introduces a confound-aware protocol that residualizes the latent representations on four representation-level statistics (SELFIES length, branch-token count, ring-token count, token entropy), checks alignment between probe directions and confound directions, and validates steering by decoding molecules and measuring property changes. Under this protocol the authors report robust monotonic steering for cLogP, FractionCSP3, HeavyAtomCount, TPSA, BertzCT and HBA; they further contrast linear versus nonlinear probes and conclude that chemically organized geometry can be recovered from entangled latent spaces once representation artifacts are controlled.

Significance. If the residualization and decoded-molecule checks are shown to be sufficient, the work supplies a concrete, reproducible template for separating chemical signal from sequence-level shortcuts in language-model-based molecular generators. This is valuable because many prior claims of property steering in VAEs rest on untested assumptions about latent geometry. The explicit use of decoded traversals and the distinction between global and local directions are practical strengths that could be adopted by the field.

major comments (2)

[Methods (confound residualization)] Confound-aware evaluation (Methods section describing residualization): the four chosen confounds (SELFIES length, branch/ring tokens, token entropy) are controlled, yet the manuscript provides no post-residualization correlation analysis between the cleaned latent codes and other string statistics (e.g., heteroatom-token n-grams or functional-group motifs) that are known to correlate with RDKit descriptors such as cLogP and TPSA. Because the central claim of chemically meaningful monotonic steering rests on the assumption that all representation artifacts have been removed, this gap is load-bearing and requires explicit testing or justification.
[Results (monotonic steering)] Results on monotonic steering (the paragraph reporting the six properties): the claim of 'robust monotonic steering' is presented without reported effect sizes, confidence intervals, or the number of decoded molecules per direction. If the monotonicity is sensitive to starting latent point or decoding temperature, the result would not survive the confound-aware protocol; quantitative statistics across multiple traversals are therefore needed to support the claim.

minor comments (3)

The abstract states that nonlinear probes 'further show' stable global directions for some properties; the main text should include the exact architecture, training procedure, and quantitative comparison (e.g., R² or steering success rate) between linear and nonlinear probes.
Notation for the residualized latent vectors and the probe-weight steering directions should be introduced once with a clear equation and then used consistently; currently the symbols shift between sections.
Table or figure captions listing the six steered properties should also report the number of molecules decoded and the range of the steering coefficient used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify important gaps in validating the completeness of our confound controls and in quantifying the steering results. We will revise the manuscript accordingly to strengthen the evidence for our claims.

read point-by-point responses

Referee: [Methods (confound residualization)] Confound-aware evaluation (Methods section describing residualization): the four chosen confounds (SELFIES length, branch/ring tokens, token entropy) are controlled, yet the manuscript provides no post-residualization correlation analysis between the cleaned latent codes and other string statistics (e.g., heteroatom-token n-grams or functional-group motifs) that are known to correlate with RDKit descriptors such as cLogP and TPSA. Because the central claim of chemically meaningful monotonic steering rests on the assumption that all representation artifacts have been removed, this gap is load-bearing and requires explicit testing or justification.

Authors: We agree that additional post-residualization checks are needed to support the assumption that representation artifacts have been adequately removed. In the revised manuscript, we will add explicit correlation analyses between the residualized latent codes and further string statistics, including heteroatom-token n-grams and functional-group motifs. We will report Pearson correlations (or equivalent) before and after residualization to demonstrate substantial reduction, and include these results in the Methods section along with a supplementary table or figure. This provides the requested explicit testing. revision: yes
Referee: [Results (monotonic steering)] Results on monotonic steering (the paragraph reporting the six properties): the claim of 'robust monotonic steering' is presented without reported effect sizes, confidence intervals, or the number of decoded molecules per direction. If the monotonicity is sensitive to starting latent point or decoding temperature, the result would not survive the confound-aware protocol; quantitative statistics across multiple traversals are therefore needed to support the claim.

Authors: We concur that quantitative details are essential to substantiate the robustness of the monotonic steering. The revised Results section will report effect sizes for property changes along each direction, 95% confidence intervals derived from multiple traversals and starting latent points, and the number of decoded molecules evaluated per direction. We will also add sensitivity analyses varying starting points and decoding temperature, with full statistics provided in the main text and expanded in the supplementary materials. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the steering derivation chain

full rationale

The paper trains an unsupervised Transformer-VAE on SELFIES, fits linear probes to external RDKit descriptors to obtain candidate directions, then validates monotonic steering via residualization on explicit string statistics (length, branch/ring tokens, entropy) plus decoded-molecule checks. No load-bearing step reduces the reported steering results to a fit on those results by construction, nor invokes self-citation for a uniqueness theorem or ansatz. The confound controls and evaluation metrics are independently measurable and falsifiable outside the probe directions themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard VAE assumptions plus the premise that RDKit descriptors provide ground-truth chemical properties independent of the SELFIES encoding.

axioms (2)

domain assumption Linear probes on frozen latent vectors can recover chemically meaningful directions when confounds are removed
Invoked when using probe weights as steering directions after residualization
ad hoc to paper SELFIES length, branch/ring tokens, and token entropy are the primary representation confounds that must be controlled
Stated as strongly encoded and the basis for the confound-aware checks

pith-pipeline@v0.9.0 · 5499 in / 1401 out tokens · 43893 ms · 2026-05-08T13:01:13.941926+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 22 canonical work pages · 2 internal anchors

[1]

Inverse molecular design using machine learning:

Benjamin Sanchez-Lengeling and Alan Aspuru-Guzik. Inverse molecular design using machine learning: Generative models for matter engineering.Science, 361(6400):360–365, 2018. doi: 10.1126/science.aat2663

work page doi:10.1126/science.aat2663 2018
[2]

Applications of machine learning in drug discovery and development , journal =

Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Pankaj Shah, Michaela Spitzer, and Shanrong Zhao. Applica- tions of machine learning in drug discovery and development.Nature Reviews Drug Discovery, 18(6):463–477, 2019. doi: 10.1038/s41573-019-0024-5

work page doi:10.1038/s41573-019-0024-5 2019
[3]

MolGenSurvey: A systematic survey in machine learning models for molecule design.arXiv preprint arXiv:2203.14500, 2022

Yuanqi Du, Tianfan Fu, Jimeng Sun, and Shengchao Liu. MolGenSurvey: A systematic survey in machine learning models for molecule design.arXiv preprint arXiv:2203.14500, 2022

work page arXiv 2022
[4]

Smiles, a chemical language and information system

David Weininger. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of Chemical Information and Computer Sciences, 28 (1):31–36, 1988. doi: 10.1021/ci00057a005

work page doi:10.1021/ci00057a005 1988
[5]

Smiles enumeration as data augmentation for neural network modeling of molecules.ArXiv, abs/1703.07076, 2017

Esben Jannik Bjerrum. SMILES enumeration as data augmentation for neural network modeling of molecules.arXiv preprint arXiv:1703.07076, 2017

work page arXiv 2017
[6]

Machine Learning: Science and Technology2(3), 035023 (2021) https://doi.org/10.1088/2632-2153/ abf0f5

Mario Krenn, Florian Hase, AkshatKumar Nigam, Pascal Friederich, and Alan Aspuru-Guzik. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020. doi: 10.1088/2632-2153/ aba947

work page doi:10.1088/2632-2153/ 2020
[7]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Informa- tion Processing Systems, volume 30, 2017

2017
[8]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186. Association for Computational Linguistics,

2019
[9]

doi: 10.18653/v1/N19-1423

work page doi:10.18653/v1/n19-1423
[10]

Lawrence Zitnick, Jerry Ma, and Rob Fergus

Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021. doi: 10.1073/pnas.2016239118

work page doi:10.1073/pnas.2016239118 2021
[11]

Shion Honda, Shoi Shi, and Hiroki R. Ueda. SMILES transformer: Pre-trained molecular fingerprint for low data drug discovery.arXiv preprint arXiv:1911.04738, 2019. doi: 10.48550/ arXiv.1911.04738

work page arXiv 1911
[12]

Seyone Chithrananda, Gabriel Grand, Bharath Ramsun- dar, et al

Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction.arXiv preprint arXiv:2010.09885,

work page arXiv 2010
[13]

doi: 10.48550/arXiv.2010.09885. 10

work page doi:10.48550/arxiv.2010.09885 2010
[14]

SELFormer: Molecular represen- tation learning via SELFIES language models.arXiv preprint arXiv:2304.04662, 2023

Atakan Yuksel, Erva Ulusoy, Atabey Unlu, and Tunca Dogan. SELFormer: Molecular represen- tation learning via SELFIES language models.arXiv preprint arXiv:2304.04662, 2023. doi: 10.48550/arXiv.2304.04662

work page doi:10.48550/arxiv.2304.04662 2023
[15]

Viraj Bagal, Rishal Aggarwal, P. K. Vinod, and U. Deva Priyakumar. MolGPT: Molecular generation using a transformer-decoder model.Journal of Chemical Information and Modeling, 62(9):2064–2076, 2021. doi: 10.1021/acs.jcim.1c00600

work page doi:10.1021/acs.jcim.1c00600 2064
[16]

Graph convolutional policy network for goal-directed molecular graph generation

Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, and Jure Leskovec. Graph convolutional policy network for goal-directed molecular graph generation. InAdvances in Neural Information Processing Systems, volume 31, 2018

2018
[17]

Zare, and Patrick Riley

Zhou Zhou, Steven Kearnes, Li Li, Richard N. Zare, and Patrick Riley. Optimization of molecules via deep reinforcement learning.Scientific Reports, 9:10752, 2019. doi: 10.1038/ s41598-019-47148-x

2019
[18]

Jan H. Jensen. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space.Chemical Science, 10(12):3567–3572, 2019. doi: 10.1039/C8SC05372C

work page doi:10.1039/c8sc05372c 2019
[19]

Coley, and Jimeng Sun

Tianfan Fu, Wenhao Gao, Connor W. Coley, and Jimeng Sun. Reinforced genetic algorithm for structure-based drug design. InAdvances in Neural Information Processing Systems, volume 35, pages 12325–12338, 2022

2022
[20]

Automatic chemical design using a data-driven continuous representation of molecules.ACS Cent

Rafael Gomez-Bombarelli, Jennifer N. Wei, David Duvenaud, Jose Miguel Hernandez-Lobato, Benjamin Sanchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alan Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS Central Science, 4(2):268–276, 2018. doi: 10.1021...

work page doi:10.1021/acscentsci.7b00572 2018
[21]

Junction tree variational autoencoder for molecular graph generation

Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for molecular graph generation. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 2323–2332. PMLR, 2018

2018
[22]

MoFlow: An invertible flow model for generating molecular graphs

Chengxi Zang and Fei Wang. MoFlow: An invertible flow model for generating molecular graphs. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 617–626, 2020. doi: 10.1145/3394486.3403104

work page doi:10.1145/3394486.3403104 2020
[23]

DiGress: Discrete denoising diffusion for graph generation

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, V olkan Cevher, and Pas- cal Frossard. DiGress: Discrete denoising diffusion for graph generation. InInternational Conference on Learning Representations, 2023

2023
[24]

Constrained bayesian optimization for automatic chemical design using variational autoencoders.Chemical Science, 11(2):577–586,

Ryan-Rhys Griffiths and Jose Miguel Hernandez-Lobato. Constrained bayesian optimization for automatic chemical design using variational autoencoders.Chemical Science, 11(2):577–586,
[25]

doi: 10.1039/C9SC04026A

work page doi:10.1039/c9sc04026a
[26]

Gilson, and Rose Yu

Peter Eckmann, Kunyang Sun, Bo Zhao, Mudong Feng, Michael K. Gilson, and Rose Yu. LIMO: Latent inceptionism for targeted molecule generation. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 5777–5792. PMLR, 2022

2022
[27]

Shah, Shengchao Liu, Jie Zhang, and Bolei Zhou

Yuanqi Du, Xinhao Liu, Neil M. Shah, Shengchao Liu, Jie Zhang, and Bolei Zhou. ChemSpace: Interpretable and interactive chemical space exploration.Transactions on Machine Learning Research, 2023

2023
[28]

Navigating chemical space with latent flows

Guanghao Wei, Yining Huang, Chenru Duan, Yue Song, and Yuanqi Du. Navigating chemical space with latent flows. InAdvances in Neural Information Processing Systems, volume 37, 2024

2024
[29]

Molgan: An implicit generative model for small molecular graphs.arXiv preprint arXiv:1805.11973, 2018

Nicola De Cao and Thomas Kipf. MolGAN: An implicit generative model for small molecular graphs.arXiv preprint arXiv:1805.11973, 2018. 11

work page arXiv 2018
[30]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. InInternational Conference on Learning Representations, 2014

2014
[31]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016

work page internal anchor Pith review arXiv 2016
[32]

Designing and Interpreting Probes with Control Tasks

John Hewitt and Percy Liang. Designing and interpreting probes with control tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 2733–2743. Association for Computational Linguistics, 2019. doi: 10.18653/v1/D19-1275

work page doi:10.18653/v1/d19-1275 2019
[33]

Computational Linguistics , year =

Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances.Computational Linguistics, 48(1):207–219, 2022. doi: 10.1162/coli_a_00422

work page internal anchor Pith review doi:10.1162/coli_a_00422 2022
[34]

Unravel- ing molecular structure: A multimodal spectroscopic dataset for chemistry.Advances in Neural Information Processing Systems, 37:125780–125808, 2024

Marvin Alberts, Oliver Schilter, Federico Zipoli, Nina Hartrampf, and Teodoro Laino. Unravel- ing molecular structure: A multimodal spectroscopic dataset for chemistry.Advances in Neural Information Processing Systems, 37:125780–125808, 2024. 12 Appendix A Code, dataset, and additional data info A.1 Experimental Setup and Reproducibility All code, noteboo...

2024
[35]

Conclusion, limita- tions and future directions

Dataset splits use train_test_split(..., random_state=42, shuffle=True) for both split stages. The notebooks seed Python random, NumPy, torch.manual_seed, and torch.cuda.manual_seed_all when CUDA is available; explicit generators use np.random.default_rng(42)or documented offsets from this seed. Exact floating-point values may vary slightly across PyTorch...

work page arXiv
[36]

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...