Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

Ping Liu; Yiwei Xie; Zheng Zhang

arxiv: 2605.25941 · v1 · pith:GNONRZCMnew · submitted 2026-05-25 · 💻 cs.CV

Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

Yiwei Xie , Ping Liu , Zheng Zhang This is my paper

Pith reviewed 2026-06-29 22:34 UTC · model grok-4.3

classification 💻 cs.CV

keywords concept erasuretext-to-video diffusionconcept-layer alignmentrepresentational separabilitydiffusion transformersmodel depthsemantic encodingCLEAR

0 comments

The pith

Aligning concept erasure to depths where target concepts separate most cleanly from other signals produces more precise suppression in text-to-video diffusion models while keeping generation quality intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that semantic information is encoded unevenly across the depth of text-to-video diffusion transformers, so that target concepts become more separable from non-target content at particular representational depths. It identifies a structural bottleneck called concept-layer topological alignment that makes layer-agnostic or heuristic erasure methods less effective because signals remain entangled outside the aligned depths. CLEAR is presented as a method that turns layer selection into an explicit optimization problem over measured concept-non-target separability. Experiments on large-scale models show that this alignment produces tighter concept suppression without degrading overall video quality.

Core claim

Text-to-video diffusion transformers encode semantic information unevenly across model depth, which creates a representational bottleneck termed concept-layer topological alignment. Under this bottleneck, target concepts exhibit higher separability at certain depths while remaining strongly entangled with non-target signals elsewhere. CLEAR reframes concept erasure as the task of locating and using those depths, operationalized by a separability-aware objective that selects layers through optimization rather than fixed or heuristic rules. Experiments confirm that enforcing this alignment yields more precise concept suppression while preserving generative quality.

What carries the argument

concept-layer topological alignment: the depth-dependent separability bottleneck in which target concepts naturally disentangle from non-target signals at specific representational layers, used to guide layer selection in the CLEAR optimization framework.

If this is right

Erasure methods that ignore depth-specific separability will remain limited by entanglement outside the aligned layers.
Layer selection formulated as separability optimization replaces heuristic or layer-agnostic choices.
Precise suppression of target concepts becomes feasible while overall video generation quality is maintained.
The same structural constraint applies across large-scale text-to-video diffusion transformers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separability-driven layer selection could be tested on image-only diffusion models to check whether the alignment phenomenon is specific to the temporal components of video generation.
If separability peaks shift with model scale or training data, the optimization in CLEAR would need to be rerun per model rather than transferred.
The approach suggests that future editing techniques might benefit from mapping multiple concepts simultaneously to find shared or conflicting alignment depths.

Load-bearing premise

Target concepts reliably show higher separability from non-target content at particular model depths, and selecting layers by that separability metric will improve erasure without creating new entanglements or quality loss.

What would settle it

A controlled comparison in which layers chosen by the separability metric produce no measurable improvement in concept suppression or introduce greater quality degradation than layers chosen uniformly at random.

Figures

Figures reproduced from arXiv: 2605.25941 by Ping Liu, Yiwei Xie, Zheng Zhang.

**Figure 1.** Figure 1: Impact of intervention depth on erasure efficacy. (Left) The orange dashed boxes indicate optimal intervention layers. Visual results show that deviating from these topological sweet spots results in suboptimal erasure or semantic persistence. (Right) The “V-shaped” Generative Rate (percentage of frames in the generated video containing the target concept, lower is better) curves confirm that different con… view at source ↗

**Figure 2.** Figure 2: Visualization of Concept-Layer Topological Alignment across Depths. We compare the layer-wise feature distributions of two distinct concept types using t-SNE. Top: Object concept (e.g., Parachute). Bottom: Sensitive concept (e.g., nudity). layers (Li et al., 2025). As a result, effective concept erasure requires intervening at depths where target and non-target semantics are more clearly separated. Iden… view at source ↗

**Figure 3.** Figure 3: The pipeline of CLEAR. During training, we utilize discrete positive/negative prompt pairs and Gumbel-Softmax relaxation to differentiably search for the layer index N that maximizes concept separability. The framework trains a SAE to decompose the hidden states, using a contrastive loss (Lcon) to isolate the target concept direction (Wspe) from universal features (Wuni). During inference, the learned SAE … view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of concept erasure. We present visual results of CLEAR alongside three state-of-the-art baselines: NegPrompt, SAFREE, and T2VUnlearning. 5B (Wan et al., 2025) and CogVideoX-2B (Yang et al., 2025), covering different model scales. Interventions are applied to blocks in the T5-based text encoder. We compare against three representative baselines: NegPrompt (Li et al., 2024), SAFREE (Y… view at source ↗

**Figure 5.** Figure 5: Evolution of layer selection probabilities during the CLEAR search process. The x-axis represents the training iterations (0 to 2500/3750), and the y-axis represents the probability of the dominant layer. As the Gumbel-Softmax temperature anneals, the probability distribution sharpens from a uniform initialization to a deterministic selection, pinpointing distinct topological depths for different concepts.… view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of object concepts on Wan2.2-5B. Comparing Origin, NegPrompt, T2VUnlearning, and CLEAR. F. Additional Results on Objects Concepts Due to space constraints in the main text, we presented averaged metrics of overall consistency, imaging quality and aesthetic quality on objects concepts. Here, we provide the detailed breakdown of erasure performance across all 10 individual object categ… view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of nudity concepts on Wan2.2-5B. Comparing Origin, NegPrompt, T2VUnlearning, and CLEAR. Origin NegPrompt SAFREE T2VUnlearning CLEAR Biden Trump Elizabeth [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison of celebrity concepts on Wan2.2-5B. Comparing Origin, NegPrompt, T2VUnlearning, and CLEAR. H. Additional Results on Celebrity Concepts We provide extensive visual comparisons in [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison of artist styles concepts on Wan2.2-5B under CLEAR. We present a generation matrix to evaluate the precision of CLEAR in disentangling different artistic styles. Columns correspond to prompts conditioning on five distinct artists (Andy Warhol, Picasso, Van Gogh, Rembrandt, Caravaggio). Rows indicate the specific erasure intervention applied (e.g., the second row “Warhol” shows result… view at source ↗

read the original abstract

Text-to-video diffusion transformers encode semantic information unevenly across model depth, which constrains effective concept erasure. We identify a representational bottleneck, termed concept-layer topological alignment, under which target concepts exhibit higher separability at certain representational depths. Outside these depths, concept and non-target signals remain strongly entangled, limiting the effectiveness of depth-specific erasure. This observation reframes concept erasure as the problem of identifying representational depths where concept-non-target separation naturally emerges. Motivated by this structural constraint, we introduce CLEAR, a separability-driven optimization framework for concept erasure that explicitly enforces concept-layer alignment. CLEAR operationalizes this principle by formulating layer selection as an optimization problem over concept-non-target separability, rather than relying on layer-agnostic or heuristic choices. To enable this, we introduce a separability-aware objective that favors layers exhibiting stronger concept-non-target separation. Experiments on large-scale text-to-video diffusion models demonstrate that enforcing concept--layer alignment leads to more precise concept suppression while preserving overall generative quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames concept erasure as a layer-selection problem in video diffusion models and offers an optimization method to pick better layers, but the abstract gives no details on whether the experiments actually deliver.

read the letter

The main thing here is that the authors notice semantic concepts are not uniformly entangled across layers in text-to-video diffusion transformers. They argue there are depths where target concepts separate more cleanly from the rest, and they build CLEAR to treat layer choice as an optimization over a separability score instead of picking by hand or using a fixed rule.

That reframing is the clearest new piece. It moves the conversation from "which layer should we edit" to "how do we measure and maximize separation at each depth." If the separability objective is well-defined and the optimization is cheap, this could be a practical step for anyone doing targeted editing in these models.

The experiments are described only at the level of "we tested on large models and got better suppression with no quality drop." Without the actual numbers, baselines, or controls, it is impossible to tell whether the gain is real, whether the metric correlates with downstream erasure success, or whether the method introduces new artifacts in other layers. The topological alignment claim also rests on an unshown assumption that separability is stable enough to optimize over.

This is for researchers already working on concept control or safety edits in diffusion models. A reader who wants to try layer-aware editing might pick up the basic idea, but the paper needs the full experimental section and comparisons before it can be used as a reference. It is worth sending to referees so the claims can be checked against the data rather than the abstract.

Referee Report

0 major / 2 minor

Summary. The manuscript identifies uneven encoding of semantic information across depths in text-to-video diffusion transformers, positing a 'concept-layer topological alignment' bottleneck where target concepts show higher separability at certain representational depths. It introduces CLEAR, a separability-driven optimization framework that formulates layer selection as an optimization over concept-non-target separability (rather than heuristics) and claims that enforcing this alignment yields more precise concept suppression while preserving generative quality, supported by experiments on large-scale models.

Significance. If the experimental outcomes hold under scrutiny, the work could meaningfully advance concept erasure methods by replacing ad-hoc layer choices with an optimization grounded in measured separability, offering a structural reframing that may generalize to other generative architectures. The separability-aware objective and explicit layer-alignment principle constitute a potentially useful contribution if the metric is well-defined and the gains are shown to be robust.

minor comments (2)

The abstract introduces the term 'concept-layer topological alignment' and a 'separability-aware objective' without providing definitions, measurement procedures, or pseudocode; these details are essential for evaluating whether the optimization is well-posed or circular.
No quantitative results, baselines, metrics (e.g., suppression success rate, FID, CLIP scores), or controls for quality preservation are described, preventing assessment of whether the claimed improvements are statistically meaningful or confounded by other factors.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of our work on identifying the concept-layer topological alignment bottleneck in text-to-video diffusion models and introducing the CLEAR framework. The summary accurately reflects the manuscript's contributions. No specific major comments were provided in the report, so we have no point-by-point responses. We remain available to address any additional questions or provide further experimental details to resolve the uncertainty in the recommendation.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The provided abstract and context contain no equations, derivations, fitted parameters, or self-citations that could be inspected for reduction to inputs by construction. The description of concept-layer topological alignment and the CLEAR framework is presented as an observation motivating an optimization, with claims resting on experimental results rather than any self-definitional or fitted-input structure. Per the rules, when no load-bearing step can be quoted and exhibited as circular, the score is 0 and steps remain empty. This matches the reader's note that abstract-only material precludes circularity assessment.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review; no free parameters, axioms, or invented entities can be extracted beyond the high-level description of the new term 'concept-layer topological alignment'.

invented entities (1)

concept-layer topological alignment no independent evidence
purpose: Representational bottleneck explaining why concept erasure is depth-dependent
Introduced in the abstract as the key structural constraint; no independent evidence or falsifiable handle provided.

pith-pipeline@v0.9.1-grok · 5703 in / 1100 out tokens · 25490 ms · 2026-06-29T22:34:11.755410+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 9 canonical work pages · 3 internal anchors

[1]

SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders

Cassano, E., Renzulli, R., Nurisso, M., Zaffaroni, M., Per- otti, A., and Grangetto, M. Saemnesia: Erasing concepts in diffusion models with supervised sparse autoencoders. arXiv preprint arXiv:2509.21379,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

A single neuron works: Precise concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2509.21008, 2025a

He, Q., Weng, J., Tao, J., and Xue, H. A single neuron works: Precise concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2509.21008, 2025a. He, Z., Jin, M., Shen, B., Payani, A., Zhang, Y ., and Du, M. SAE-SSV: Supervised steering in sparse representa- tion spaces for reliable control of language models. In EMNLP, 2025b. He, Z., Xiong,...

work page arXiv
[3]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Kong, W., Tian, Q., Zhang, Z., Min, R., Dai, Z., Zhou, J., Xiong, J., Li, X., Wu, B., Zhang, J., et al. Hunyuan- video: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

and Zhang, C

Liu, P. and Zhang, C. Erased or dormant? rethinking concept erasure through reversibility.arXiv preprint arXiv:2505.16174,

work page arXiv
[5]

and Tan, Y

Liu, S. and Tan, Y . Unlearning concepts from text-to- video diffusion models.arXiv preprint arXiv:2407.14209,

work page arXiv
[6]

V ., Borkakoty, H., and Hou, Y

10 Where Concept Erasure Should Occur: Concept–Layer Alignment for T2V Erasure Pham, M. V ., Borkakoty, H., and Hou, Y . Where knowl- edge collides: A mechanistic study of intra-memory knowledge conflict in language models.arXiv preprint arXiv:2601.09445,

work page arXiv
[7]

Sparse autoencoder as a zero-shot classifier for concept erasing in text-to-image diffusion models.arXiv preprint arXiv:2503.09446,

Tian, Z., Nan, S., Xu, M., Zhai, S., Qu, W., Liu, J., Ren, K., Jia, R., and Zhang, J. Sparse autoencoder as a zero-shot classifier for concept erasing in text-to-image diffusion models.arXiv preprint arXiv:2503.09446,

work page arXiv
[8]

Wan: Open and Advanced Large-Scale Video Generative Models

Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.-W., Chen, D., Yu, F., Zhao, H., Yang, J., Zeng, J., Wang, J., Zhang, J., Zhou, J., Wang, J., Chen, J., Zhu, K., Zhao, K., Yan, K., Huang, L., Feng, M., Zhang, N., Li, P., Wu, P., Chu, R., Feng, R., Zhang, S., Sun, S., Fang, T., Wang, T., Gui, T., Weng, T., Shen, T., Lin, W., Wang, W., Wang, W., Zhou, W....

work page internal anchor Pith review Pith/arXiv arXiv
[9]

T2vunlearning: A concept erasing method for text-to- video diffusion models.arXiv preprint arXiv:2505.17550,

Ye, X., Cheng, S., Wang, Y ., Xiong, Y ., and Li, Y . T2vunlearning: A concept erasing method for text-to- video diffusion models.arXiv preprint arXiv:2505.17550,

work page arXiv
[10]

To ensure robust generalization across diverse semantic contexts, we construct a comprehensive training dataset using prompts synthesized by a Large Language Model (LLM)

Our unified SAE is configured with a hidden dimension of 131,072 , utilizing a standard ℓ1 penalty to induce sparsity. To ensure robust generalization across diverse semantic contexts, we construct a comprehensive training dataset using prompts synthesized by a Large Language Model (LLM). Each pair shares identical semantics, differing only by the target ...

2013
[11]

In contrast, inference-time baselines often retain recognizable identity cues, whereas parameter-update methods may introduce noticeable visual distortion or semantic drift

Compared with existing baselines, CLEAR more effectively suppresses identity-specific facial characteristics while preserving scene composition, pose, and overall visual realism. In contrast, inference-time baselines often retain recognizable identity cues, whereas parameter-update methods may introduce noticeable visual distortion or semantic drift. 16 W...

1992
[12]

Multi-concept Erasure Experiments Table 15 demonstrates CLEAR’s robustness in multi-concept settings

J. Multi-concept Erasure Experiments Table 15 demonstrates CLEAR’s robustness in multi-concept settings. In intra-category tasks (e.g., erasing Van Gogh and Picasso simultaneously), baselines like T2VUnlearning suffer from severe catastrophic forgetting, indiscriminately degrading un-targeted related concepts (e.g., Rembrandt’s VCLIP score plunges to 0.02...

1945

[1] [1]

SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders

Cassano, E., Renzulli, R., Nurisso, M., Zaffaroni, M., Per- otti, A., and Grangetto, M. Saemnesia: Erasing concepts in diffusion models with supervised sparse autoencoders. arXiv preprint arXiv:2509.21379,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

A single neuron works: Precise concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2509.21008, 2025a

He, Q., Weng, J., Tao, J., and Xue, H. A single neuron works: Precise concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2509.21008, 2025a. He, Z., Jin, M., Shen, B., Payani, A., Zhang, Y ., and Du, M. SAE-SSV: Supervised steering in sparse representa- tion spaces for reliable control of language models. In EMNLP, 2025b. He, Z., Xiong,...

work page arXiv

[3] [3]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Kong, W., Tian, Q., Zhang, Z., Min, R., Dai, Z., Zhou, J., Xiong, J., Li, X., Wu, B., Zhang, J., et al. Hunyuan- video: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

and Zhang, C

Liu, P. and Zhang, C. Erased or dormant? rethinking concept erasure through reversibility.arXiv preprint arXiv:2505.16174,

work page arXiv

[5] [5]

and Tan, Y

Liu, S. and Tan, Y . Unlearning concepts from text-to- video diffusion models.arXiv preprint arXiv:2407.14209,

work page arXiv

[6] [6]

V ., Borkakoty, H., and Hou, Y

10 Where Concept Erasure Should Occur: Concept–Layer Alignment for T2V Erasure Pham, M. V ., Borkakoty, H., and Hou, Y . Where knowl- edge collides: A mechanistic study of intra-memory knowledge conflict in language models.arXiv preprint arXiv:2601.09445,

work page arXiv

[7] [7]

Sparse autoencoder as a zero-shot classifier for concept erasing in text-to-image diffusion models.arXiv preprint arXiv:2503.09446,

Tian, Z., Nan, S., Xu, M., Zhai, S., Qu, W., Liu, J., Ren, K., Jia, R., and Zhang, J. Sparse autoencoder as a zero-shot classifier for concept erasing in text-to-image diffusion models.arXiv preprint arXiv:2503.09446,

work page arXiv

[8] [8]

Wan: Open and Advanced Large-Scale Video Generative Models

Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.-W., Chen, D., Yu, F., Zhao, H., Yang, J., Zeng, J., Wang, J., Zhang, J., Zhou, J., Wang, J., Chen, J., Zhu, K., Zhao, K., Yan, K., Huang, L., Feng, M., Zhang, N., Li, P., Wu, P., Chu, R., Feng, R., Zhang, S., Sun, S., Fang, T., Wang, T., Gui, T., Weng, T., Shen, T., Lin, W., Wang, W., Wang, W., Zhou, W....

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

T2vunlearning: A concept erasing method for text-to- video diffusion models.arXiv preprint arXiv:2505.17550,

Ye, X., Cheng, S., Wang, Y ., Xiong, Y ., and Li, Y . T2vunlearning: A concept erasing method for text-to- video diffusion models.arXiv preprint arXiv:2505.17550,

work page arXiv

[10] [10]

To ensure robust generalization across diverse semantic contexts, we construct a comprehensive training dataset using prompts synthesized by a Large Language Model (LLM)

Our unified SAE is configured with a hidden dimension of 131,072 , utilizing a standard ℓ1 penalty to induce sparsity. To ensure robust generalization across diverse semantic contexts, we construct a comprehensive training dataset using prompts synthesized by a Large Language Model (LLM). Each pair shares identical semantics, differing only by the target ...

2013

[11] [11]

In contrast, inference-time baselines often retain recognizable identity cues, whereas parameter-update methods may introduce noticeable visual distortion or semantic drift

Compared with existing baselines, CLEAR more effectively suppresses identity-specific facial characteristics while preserving scene composition, pose, and overall visual realism. In contrast, inference-time baselines often retain recognizable identity cues, whereas parameter-update methods may introduce noticeable visual distortion or semantic drift. 16 W...

1992

[12] [12]

Multi-concept Erasure Experiments Table 15 demonstrates CLEAR’s robustness in multi-concept settings

J. Multi-concept Erasure Experiments Table 15 demonstrates CLEAR’s robustness in multi-concept settings. In intra-category tasks (e.g., erasing Van Gogh and Picasso simultaneously), baselines like T2VUnlearning suffer from severe catastrophic forgetting, indiscriminately degrading un-targeted related concepts (e.g., Rembrandt’s VCLIP score plunges to 0.02...

1945