pith. sign in

arxiv: 2604.16367 · v1 · submitted 2026-03-22 · 💻 cs.CY · cs.AI

Talk, Walk, and Market Response: Multimodal Measurement of AI Washing and Its Capital Market Consequences in China

Pith reviewed 2026-05-15 01:11 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords AI washingmultimodal analysisinstitutional investorssite visitscapital market responseChina A-sharesdisclosure qualitypatent innovation
0
0 comments X

The pith

Long-horizon institutional investors detect AI washing through site visits, reduce holdings, and trigger analyst downgrades plus sharp valuation corrections within 180 days.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multimodal score to quantify the gap between AI claims in corporate reports and actual investments, then tests how markets respond. It shows that this gap does not predict future real spending, especially among financially constrained firms, and that genuine investment raises patent quality while rhetoric crowds out broader innovation. Institutional investors who visit companies identify the mismatch and sell shares. This selling prompts downgrades and price corrections. The patterns persist after instrumenting with the ChatGPT launch as an exogenous shock to AI attention.

Core claim

In China's A-share market from 2018 to 2025, an AI Washing Risk Score built from text-image consistency in annual reports and roadshows fails to forecast a Material Real-Investment Matching Index based on patent quality, intangible capitalization, and personnel pay. Long-horizon institutions that conduct site visits reduce holdings in high-washing firms, which in turn produces analyst downgrades, retail outflows, and valuation declines inside 180 days.

What carries the argument

The AI Washing Risk Score (AWRS), a multimodal consistency measure between textual claims and images in disclosures, paired with the Material Real-Investment Matching Index (MRMI) that combines patent quality, AI asset capitalization, and technical compensation via principal components.

If this is right

  • Substantive AI investments raise the production of high-quality patents.
  • Purely rhetorical AI claims reduce industry-wide innovation by displacing real R&D resources.
  • Long-horizon institutions use site visits to identify washing and cut positions.
  • Institutional divestment causes analyst downgrades, retail selling, and valuation drops within six months.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar multimodal checks could flag thematic exaggeration in other emerging technologies such as quantum or biotech.
  • Markets outside China with comparable disclosure formats might show parallel investor detection patterns.
  • Regulators could incorporate automated consistency scores into ongoing monitoring of thematic stock bubbles.

Load-bearing premise

The vision-language model used to score text-image consistency in reports measures the rhetoric-action gap without systematic bias from formatting or model limitations.

What would settle it

Finding no measurable difference in institutional selling, analyst downgrades, or 180-day returns between high- and low-washing firms after site visits, once other earnings and governance signals are controlled.

Figures

Figures reproduced from arXiv: 2604.16367 by Guo Jingqiao, Wen Zhanjie.

Figure 1
Figure 1. Figure 1: Time Series Trend of AI Disclosure (Evolution under ChatGPT Shock) [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dynamic Trend of Distributed Lag Effects of AI Rhetoric vs. Substantive Investment [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Parallel Trend Test for Staggered DID Based on ChatGPT Technology Shock [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗
read the original abstract

As artificial intelligence and generative large language models drive industrial upgrading, capital markets increasingly focus on AI-themed listed firms. Information asymmetry and technological opacity lower the cost of exaggerating AI capabilities relative to genuine R&D, spurring widespread AI Washing. Using China's A-share market from 2018Q1 to 2025Q2, we advance literature in measurement and mechanism testing. We construct a multimodal AI Washing Risk Score (AWRS) via Qwen-VL to assess text-image consistency in annual reports and roadshows, and a Material Real-Investment Matching Index (MRMI) from patent quality, AI intangible asset capitalization, and technical personnel compensation using PCA. Four findings emerge: (1) AWRS lacks predictive power for future MRMI, with a wider rhetoric-action gap among financially constrained firms; (2) substantive AI investment boosts high-quality patents, while empty rhetoric crowds out industry innovation; (3) long-horizon institutional investors detect AI Washing through site visits and reduce holdings; (4) such divestment triggers analyst downgrades, retail selling, and sharp valuation corrections within 180 days. Results are robust to IV-2SLS and staggered DID using the ChatGPT shock. This study enhances disclosure and pricing-efficiency research and supports RegTech for curbing thematic speculation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper constructs a multimodal AI Washing Risk Score (AWRS) via Qwen-VL analysis of text-image consistency in Chinese A-share annual reports and roadshows, alongside a Material Real-Investment Matching Index (MRMI) derived via PCA from patent quality, AI intangible capitalization, and technical personnel compensation. Using 2018Q1–2025Q2 data, it reports that AWRS lacks predictive power for future MRMI (with larger gaps among financially constrained firms), that substantive AI investment raises high-quality patents while rhetoric crowds out innovation, that long-horizon institutions detect washing via site visits and reduce holdings, and that such divestment triggers analyst downgrades, retail selling, and valuation corrections within 180 days. Identification relies on IV-2SLS and staggered DID exploiting the ChatGPT shock.

Significance. If the AWRS measurement is shown to be valid, the paper would advance disclosure-quality and pricing-efficiency research by providing a scalable multimodal approach to detecting rhetoric-action gaps in emerging-market AI disclosures and by documenting investor detection and subsequent market corrections. The combination of VLM-based measurement with an external shock for identification is a methodological strength that could support RegTech applications.

major comments (2)
  1. [AWRS construction] AWRS construction (abstract and measurement section): the claim that Qwen-VL reliably captures the rhetoric-action gap rests on an unvalidated VLM application to Chinese-language filings containing embedded charts and technical phrasing. No human validation, inter-rater reliability statistics, or robustness checks to prompt variation, image resolution, or layout artifacts are reported; any systematic bias would directly undermine the interpretation of findings (3) and (4) on institutional detection, holdings reductions, downgrades, and 180-day corrections.
  2. [Identification strategy] Identification strategy (IV-2SLS and staggered DID sections): the ChatGPT shock is invoked for causal identification, yet the manuscript does not report the precise exclusion restriction, first-stage diagnostics, or parallel-trends tests for the DID specification. These details are load-bearing for the claim that divestment by long-horizon investors triggers the observed market responses.
minor comments (2)
  1. [Abstract] The abstract states that MRMI is obtained via PCA but omits the number of retained components, variance explained, or loading thresholds; adding these would improve reproducibility.
  2. [Results] Table or figure presenting AWRS-MRMI correlations should include robustness to alternative VLM prompts or report formats to address potential formatting sensitivity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and will incorporate revisions to enhance the validity and transparency of our measurement and identification strategies.

read point-by-point responses
  1. Referee: [AWRS construction] AWRS construction (abstract and measurement section): the claim that Qwen-VL reliably captures the rhetoric-action gap rests on an unvalidated VLM application to Chinese-language filings containing embedded charts and technical phrasing. No human validation, inter-rater reliability statistics, or robustness checks to prompt variation, image resolution, or layout artifacts are reported; any systematic bias would directly undermine the interpretation of findings (3) and (4) on institutional detection, holdings reductions, downgrades, and 180-day corrections.

    Authors: We agree that the current version lacks sufficient validation for the Qwen-VL application. In the revised manuscript, we will add a human validation exercise on a stratified random sample of 500 Chinese annual reports and roadshows, with two independent coders assessing text-image consistency. We will report inter-rater reliability (Cohen's kappa and percentage agreement). Additionally, we will include robustness checks for prompt variations, image resolution downsampling, and layout artifacts (e.g., by masking charts). These will be presented in a new subsection of the measurement section to support the reliability of AWRS. revision: yes

  2. Referee: [Identification strategy] Identification strategy (IV-2SLS and staggered DID sections): the ChatGPT shock is invoked for causal identification, yet the manuscript does not report the precise exclusion restriction, first-stage diagnostics, or parallel-trends tests for the DID specification. These details are load-bearing for the claim that divestment by long-horizon investors triggers the observed market responses.

    Authors: We acknowledge the omission of key identification diagnostics. In the revision, we will explicitly articulate the exclusion restriction: the ChatGPT shock (November 2022) is treated as an exogenous increase in global AI salience that affects disclosure rhetoric and investor scrutiny but does not directly alter firm-level AI investment opportunities or fundamentals in the short run. We will report first-stage F-statistics and weak-instrument tests for the IV-2SLS specification. For the staggered DID, we will add parallel-trends tests, pre-trend coefficients, and event-study figures showing no differential trends prior to the shock. These will be placed in the identification and robustness sections. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical measurement and causal identification

full rationale

The paper constructs AWRS via Qwen-VL text-image consistency on reports/roadshows and MRMI via PCA on patent/investment metrics, then tests relations through regressions with IV-2SLS and ChatGPT-staggered DID. These steps use external data sources and instruments rather than any self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or claims reduce the key findings (e.g., AWRS lacking predictive power for MRMI, or institutional detection via site visits) to tautologies by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim depends on the validity of two newly invented measurement constructs and standard domain assumptions about market information processing and causal identification.

free parameters (1)
  • PCA component selection for MRMI
    The number and weighting of principal components combining patent quality, AI intangible capitalization, and technical personnel compensation are derived from the data.
axioms (1)
  • domain assumption Markets incorporate information from institutional trading and analyst actions into prices within 180 days
    The valuation correction result assumes semi-strong efficiency in processing the divestment signal.
invented entities (2)
  • AI Washing Risk Score (AWRS) no independent evidence
    purpose: Quantify discrepancy between AI rhetoric in text and visual content of reports
    New multimodal construct introduced to measure washing risk.
  • Material Real-Investment Matching Index (MRMI) no independent evidence
    purpose: Measure substantive AI investment from patents, assets, and compensation
    New composite index constructed via PCA.

pith-pipeline@v0.9.0 · 5534 in / 1449 out tokens · 87935 ms · 2026-05-15T01:11:22.829303+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

  1. [1]

    (1) Acemoglu, D., & Restrepo, P. (2018). The race between man and machine: Implications of technology for growth, factor shares, and employment. American Economic Review, 108(6), 1488–1542. (2) Akerlof, G. A. (1970). The market for 'lemons': Quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488–500. (3) Babina, T., F...

  2. [2]

    Evaluation of the Effect, Impact Mechanism, and Implementation Path of Integrating Data and Reality to Empower Supply Chain Resilience Enhancement

    (11) Fang, X., & Liu, M. (2025). Technical opacity and capital market resource misallocation risk in AI claims. Economic Research Journal, 60(1), 85–102. [In Chinese] (12) Hirshleifer, D., Lim, S. S., & Teoh, S. H. (2009). Driven to distraction: Extraneous events and underreaction to earnings news. The Journal of Finance, 64(5), 2289–2325. (13) Jiang, G.,...

  3. [3]

    Evaluation of the Effect, Impact Mechanism, and Implementation Path of Integrating Data and Reality to Empower Supply Chain Resilience Enhancement

    (22) Sims, C. A. (2003). Implications of rational inattention. Journal of Monetary Economics, 50(3), 665–690. (23) Sloan, R. G. (1996). Do stock prices fully reflect information in accruals and cash flows about future earnings? The Accounting Review, 71(3), 289–315. (24) Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 3...