Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

· 2026 · cs.CV · arXiv 2603.12845

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Predicting enzyme kinetic parameters quantifies how efficiently an enzyme catalyzes a specific substrate under defined biochemical conditions. Canonical parameters such as the turnover number ($k_\text{cat}$), Michaelis constant ($K_\text{m}$), and inhibition constant ($K_\text{i}$) depend jointly on the enzyme sequence, the substrate chemistry, and the conformational adaptation of the active site during binding. Many learning pipelines simplify this process to a static compatibility problem between the enzyme and substrate, fusing their representations through shallow operations and regressing a single value. Such formulations overlook the staged nature of catalysis, which involves both substrate recognition and conformational adaptation. In this regard, we reformulate kinetic prediction as a staged multimodal conditional modeling problem and introduce the Enzyme-Reaction Bridging Adapter (ERBA), which injects cross-modal information via fine-tuning into Protein Language Models (PLMs) while preserving their biochemical priors. ERBA performs conditioning in two stages: Molecular Recognition Cross-Attention (MRCA) first injects substrate information into the enzyme representation to capture specificity; Geometry-aware Mixture-of-Experts (G-MoE) then integrates active-site structure and routes samples to pocket-specialized experts to reflect induced fit. To maintain semantic fidelity, Enzyme-Substrate Distribution Alignment (ESDA) enforces distributional consistency within the PLM manifold in a reproducing kernel Hilbert space. Experiments across three kinetic endpoints and multiple PLM backbones, ERBA delivers consistent gains and stronger out-of-distribution performance compared with sequence-only and shallow-fusion baselines, offering a biologically grounded route to scalable kinetic prediction and a foundation for adding cofactors, mutations, and time-resolved structural cues.

representative citing papers

3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

A framework that combines MLLM-based image enhancement with a medium-aware 3D Gaussian Splatting model to reconstruct and render smoke scenes.

Beyond Shortcuts: Mitigating Visual Illusions in Frozen VLMs via Qualitative Reasoning

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

SQI uses axiomatic constraints, hierarchical decomposition, and counterfactual verification to align linguistic reasoning with visual perception in frozen VLMs, achieving second place on the DataCV 2026 illusion challenge.

citing papers explorer

Showing 2 of 2 citing papers.

3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models cs.CV · 2026-04-07 · unverdicted · none · ref 48 · internal anchor
A framework that combines MLLM-based image enhancement with a medium-aware 3D Gaussian Splatting model to reconstruct and render smoke scenes.
Beyond Shortcuts: Mitigating Visual Illusions in Frozen VLMs via Qualitative Reasoning cs.CV · 2026-04-29 · unverdicted · none · ref 15 · internal anchor
SQI uses axiomatic constraints, hierarchical decomposition, and counterfactual verification to align linguistic reasoning with visual perception in frozen VLMs, achieving second place on the DataCV 2026 illusion challenge.

Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation

fields

years

verdicts

representative citing papers

citing papers explorer