pith. machine review for the scientific record. sign in

arxiv: 2404.05892 · v4 · pith:B7KD4YH3new · submitted 2024-04-08 · 💻 cs.CL · cs.AI

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Pith reviewed 2026-05-17 23:15 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords RWKVrecurrent neural networkslanguage modelsmatrix-valued statesdynamic recurrencemultilingual corpusefficient inferencesequence modeling
0
0 comments X

The pith

Matrix-valued states and dynamic recurrence let updated RWKV models reach competitive benchmark performance while keeping RNN inference speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Eagle and Finch as the next versions of the RWKV architecture. It replaces scalar states with multi-headed matrix-valued states and adds a dynamic recurrence mechanism to increase what the model can express without changing the core linear-time inference property of recurrent networks. The authors also release a new 1.12-trillion-token multilingual corpus and a fast tokenizer. They train models from 0.46B to 7.5B parameters and report that the resulting systems match or approach the results of other current models across many standard language benchmarks.

Core claim

Eagle (RWKV-5) and Finch (RWKV-6) achieve competitive performance on language benchmarks by replacing the original RWKV scalar states with multi-headed matrix-valued states and introducing a dynamic recurrence mechanism that improves expressivity while preserving the inference efficiency of RNNs.

What carries the argument

Multi-headed matrix-valued states combined with a dynamic recurrence mechanism that replaces the fixed scalar recurrence of earlier RWKV versions.

If this is right

  • The architecture remains linear in sequence length at inference time, so longer contexts become practical without quadratic cost.
  • The new state representation may allow scaling to larger parameter counts while retaining the memory and speed advantages of RNN-style models.
  • Multilingual capabilities improve through the dedicated large corpus and tokenizer, potentially reducing the need for separate language-specific models.
  • Open release of models, training code, and inference code enables direct replication and further architectural experiments by others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the dynamic recurrence generalizes, similar state expansions could be tested in other recurrent architectures to close the remaining gap with attention-based models.
  • The matrix state design might change how scaling laws apply, because each token now carries richer internal information than scalar states allowed.
  • These changes could extend usefully to non-text sequences such as audio or time-series data where maintaining long-range dependencies with low compute is valuable.

Load-bearing premise

The observed benchmark gains come primarily from the matrix-valued states and dynamic recurrence rather than from the scale of the new training corpus or tokenizer changes.

What would settle it

Train an RWKV-4 model on the identical 1.12-trillion-token corpus and tokenizer, then compare its benchmark scores directly to the Eagle and Finch models of similar size.

read the original abstract

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Eagle (RWKV-5) and Finch (RWKV-6) as advancements over the RWKV-4 architecture, featuring multi-headed matrix-valued states and a dynamic recurrence mechanism intended to increase expressivity while preserving linear-time RNN inference. It also introduces a new 1.12-trillion-token multilingual corpus and a greedy tokenizer. The authors train four Eagle models (0.46B to 7.5B parameters) and two Finch models (1.6B and 3.1B parameters) from scratch and report competitive results across benchmarks, with all models, training code, and inference code released publicly under Apache 2.0.

Significance. If the matrix-valued states and dynamic recurrence can be shown to drive the gains independently of the new corpus and tokenizer, the work would advance efficient sequence modeling by offering a scalable RNN alternative to transformers. The open release of models and code is a clear strength that aids reproducibility. The large multilingual corpus is a useful contribution. Significance is reduced, however, because the empirical results do not isolate the architectural changes from the new training distribution.

major comments (2)
  1. [Abstract and Experimental Results] The central empirical claim—that multi-headed matrix-valued states and dynamic recurrence produce competitive performance—is not supported by any ablation that holds the 1.12 T token corpus, greedy tokenizer, and optimization schedule fixed while reverting only the state representation and recurrence rule to RWKV-4. All reported models were trained from scratch on the new data, so benchmark numbers cannot be causally attributed to the proposed architecture rather than to scale or quality of the training distribution.
  2. [Abstract and Results] The abstract asserts that the models 'achieve competitive performance across a wide variety of benchmarks' yet supplies no tables, no direct comparisons to RWKV-4 or other models at matched parameter counts, no error bars, and no mention of multiple random seeds. This absence makes it impossible to evaluate the reliability or magnitude of the claimed improvements.
minor comments (2)
  1. [Introduction] Clarify the precise architectural distinctions between Eagle and Finch beyond the shared matrix-state and dynamic-recurrence features; the abstract introduces both but does not delineate their individual design choices.
  2. Add a short related-work paragraph situating the dynamic recurrence against prior linear-attention or state-space models to help readers assess novelty.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, clarifying our experimental design and outlining planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Experimental Results] The central empirical claim—that multi-headed matrix-valued states and dynamic recurrence produce competitive performance—is not supported by any ablation that holds the 1.12 T token corpus, greedy tokenizer, and optimization schedule fixed while reverting only the state representation and recurrence rule to RWKV-4. All reported models were trained from scratch on the new data, so benchmark numbers cannot be causally attributed to the proposed architecture rather than to scale or quality of the training distribution.

    Authors: We agree that an ablation isolating the architectural modifications from the new corpus and tokenizer would strengthen causal attribution. Retraining RWKV-4 from scratch on the full 1.12-trillion-token corpus under identical conditions is computationally prohibitive given current resources. In the revised manuscript we will add an explicit limitations paragraph discussing this constraint, clarify the independent contributions of the new multilingual corpus and greedy tokenizer, and include side-by-side comparisons against previously published RWKV-4 checkpoints wherever parameter counts and evaluation protocols align. The public release of all models, training code, and inference code enables the community to conduct further controlled experiments. revision: partial

  2. Referee: [Abstract and Results] The abstract asserts that the models 'achieve competitive performance across a wide variety of benchmarks' yet supplies no tables, no direct comparisons to RWKV-4 or other models at matched parameter counts, no error bars, and no mention of multiple random seeds. This absence makes it impossible to evaluate the reliability or magnitude of the claimed improvements.

    Authors: The abstract is intentionally concise; the full manuscript contains tables with direct comparisons to RWKV-4 and other models at matched scales, together with benchmark details, in the Experiments section. We will revise the abstract to reference these results more explicitly. Large-scale pretraining runs are customarily reported as single executions due to cost; we will add a clarifying statement in the revised text noting the single-run nature of the results and will include error bars for any smaller-scale ablations that become feasible. revision: yes

standing simulated objections not resolved
  • A full ablation study that retrains RWKV-4 on the identical 1.12-trillion-token corpus and tokenizer while reverting only the state and recurrence components, which remains infeasible due to training cost.

Circularity Check

0 steps flagged

No significant circularity; empirical training results are self-contained

full rationale

The paper introduces architectural changes (matrix-valued states, dynamic recurrence) to the RWKV model, describes a new training corpus and tokenizer, then reports benchmark scores obtained by training the new models from scratch. No derivation chain is present that reduces a claimed prediction or first-principles result to its own inputs by construction, self-definition, or a load-bearing self-citation. Performance numbers are produced by direct optimization and evaluation rather than by fitting a parameter and relabeling the fit as a prediction; the central claims therefore remain independent of the circularity patterns enumerated in the analysis criteria.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning training assumptions plus the domain assumption that the new state and recurrence mechanisms increase expressivity without harming efficiency. No new physical entities are postulated. Free parameters include chosen model sizes and the scale of the newly assembled corpus.

free parameters (2)
  • model parameter counts
    0.46B–7.5B for Eagle and 1.6B/3.1B for Finch; chosen to span practical scales.
  • training corpus size
    1.12 trillion tokens assembled for this work.
axioms (1)
  • domain assumption Matrix-valued multi-headed states and dynamic recurrence improve expressivity while preserving RNN inference characteristics.
    Invoked in the architectural design section of the abstract.

pith-pipeline@v0.9.0 · 5634 in / 1459 out tokens · 41955 ms · 2026-05-17T23:15:46.657659+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Cost.FunctionalEquation washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs.

  • Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality.

  • Foundation.LawOfExistence law_of_existence unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Test-Time Training with KV Binding Is Secretly Linear Attention

    cs.LG 2026-02 conditional novelty 8.0

    Test-time training with KV binding reduces to learned linear attention.

  2. Learning to (Learn at Test Time): RNNs with Expressive Hidden States

    cs.LG 2024-07 conditional novelty 8.0

    TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.

  3. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

    cs.LG 2024-05 unverdicted novelty 7.0

    Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.

  4. The Impossibility Triangle of Long-Context Modeling

    cs.CL 2026-05 unverdicted novelty 6.0

    No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.

  5. Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

    cs.SE 2026-04 unverdicted novelty 6.0

    Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.

  6. Learning to Adapt: In-Context Learning Beyond Stationarity

    cs.LG 2026-04 unverdicted novelty 6.0

    Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.

  7. Optimal Decay Spectra for Linear Recurrences

    cs.LG 2026-04 unverdicted novelty 6.0

    PoST reparameterizes decay spectra in linear recurrences with geometric log-spacing and position-adaptive scaling to achieve O(exp(-cN/log t)) decay, improving zero-shot language modeling and long-context retrieval ac...

  8. Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

    cs.CL 2026-04 unverdicted novelty 6.0

    PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.

  9. Generative Event Pretraining with Foundation Model Alignment

    cs.CV 2026-03 unverdicted novelty 6.0

    GEP transfers semantic knowledge from image foundation models to event data via alignment and generative pretraining on mixed sequences to create transferable event-based visual models.

  10. Higher-order Linear Attention

    cs.LG 2025-10 unverdicted novelty 6.0

    Higher-order Linear Attention realizes second-order and higher interactions in linear-time causal attention via constant-size state and associative scans.

  11. MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    cs.CL 2025-06 unverdicted novelty 6.0

    MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training...

  12. MoBA: Mixture of Block Attention for Long-Context LLMs

    cs.LG 2025-02 unverdicted novelty 6.0

    MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.

  13. Titans: Learning to Memorize at Test Time

    cs.LG 2024-12 unverdicted novelty 6.0

    Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

  14. DataComp-LM: In search of the next generation of training sets for language models

    cs.LG 2024-06 unverdicted novelty 6.0

    DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.

  15. Lessons from the Trenches on Reproducible Evaluation of Language Models

    cs.CL 2024-05 accept novelty 6.0

    The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.

  16. Gated Linear Attention Transformers with Hardware-Efficient Training

    cs.LG 2023-12 unverdicted novelty 6.0

    Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.

  17. MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

    cs.LG 2026-05 unverdicted novelty 5.0

    MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.

  18. Belief-State RWKV for Reinforcement Learning under Partial Observability

    cs.LG 2026-04 unverdicted novelty 5.0

    Belief-state RWKV maintains an uncertainty-aware recurrent state for RL policies in partial observability and shows modest gains over standard recurrent baselines in a pilot with observation noise.

  19. StateX: Enhancing RNN Recall via Post-training State Expansion

    cs.CL 2025-09 unverdicted novelty 5.0

    StateX post-trains RNNs to expand recurrent state size, improving recall and in-context learning with negligible parameter growth.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 19 Pith papers · 2 internal anchors

  1. [1]

    Generating Long Sequences with Sparse Transformers

    Association for Computational Linguistics. doi: 10.18653/v1/2022.bigscience-1.9. URL https://aclanthology.org/2022.bigscience-1.9. Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990. Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, an...

  2. [2]

    Teddy Ferdinan, Jan Koco ´ n, and Przemysław Kazienko

    https://transformer-circuits.pub/2021/framework/index.html. Teddy Ferdinan, Jan Koco ´ n, and Przemysław Kazienko. Into the unknown: Self-learning large language models, 2024. Logan Ford, Hao Tang, François Grondin, and James R Glass. A deep residual network for large- scale acoustic scene analysis. In InterSpeech, pp. 2568–2572, 2019. Daniel Y Fu, Tri Da...

  3. [3]

    Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, and Yiran Zhong

    URL https://openreview.net/forum?id=P1TCHxJwLB. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, and Yiran Zhong. Transnormerllm: A faster and better large language model with improved transnormer, 2024. 22 Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, and Jie Tang. Block...

  4. [4]

    Group Normalization

    URL https://ceur-ws.org/Vol-3528/paper1.pdf. Yuxin Wu and Kaiming He. Group normalization. arXiv:1803.08494, 2018. Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453, 2023. Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Pe...

  5. [5]

    The most recent single-timestep input to the Time-mixing module, denoted as xt −1 ∈ RD , useful for the Token Shift. 33 Dataset Domain Wikipediaa Encyclopedia SlimPajama Web peS2o Academia BigPatent Patents Pile of Law Legal, Administra- tive StarCoderb Code OSCAR23.01c Multilingual Web TED2020 Transcripts: TED, TEDx PhilPapers Academia: Philoso- phy NIH-...

  6. [6]

    The most recent single-timestep input to the Channel-mixing module, denoted as x′ t −1 ∈ RD , also useful in Token Shift

  7. [7]

    This is the core part of the internal state that dominates the most information

    WKV head memory: Denoted by wkvt,j ∈ R(D/h)×(D/h), for j = 1, 2,· · ·,h. This is the core part of the internal state that dominates the most information. The total size of the Eagle and Finch internal state is #(State) = L(2D + D2/h) = 66DL (30) 34 Model Name L D State Size Parameters InferFLOPs TrainFLOPs Eagle 0.4B 24 1024 1 622 016 4.62 × 108 9.33 × 10...

  8. [8]

    time-first

    Logical Reasoning, 9) Reasoning, 10) Chinese. Results Judged by CritiqueLLM (Ke et al., 2023) employs a rule-calibrated, multi-dimensional LLM-as-Judge methodology with Chain-of-Thought explanations, ensuring high interpretability and reliability. Table 14 showcases a consistent improvement in the performance of Eagle and Finch models on the AlignBench be...

  9. [9]

    Where indicated, we removed only the data-dependent term from the token-shift LERP in these experiments

    dataset at context length 512 and obtained the final validation loss results shown in 19. Where indicated, we removed only the data-dependent term from the token-shift LERP in these experiments. Model Final Validation Loss Finch 2.91 Finch with DDLerp only on decay 2.923 Finch with no DDLerp at all 2.926 Table 19: Ablation Results on DDLerp for 6 layer 76...

  10. [10]

    東京タワー:日本最高の観光スポット の一つで、都心から見ることができま す。

  11. [11]

    皇居:日本の天皇が住む宮殿であり、大正時代に建設された美しい庭園があり ます。

  12. [12]

    浅草寺:江戸時代に建てられた古い寺院で、観光客が多く訪れます。

  13. [13]

    上野公園:日本最大級の公園で、美術館や動物園などがあることで有名です。

  14. [14]

    築地市場:日本最大級の魚市場であり、新鮮な魚介類を購入することができま す。

  15. [15]

    渋谷スクランブル交差点:東京都心にある交差点で、人気のある観光スポット です。

  16. [16]

    新宿御苑:江戸時代に造られた庭園で、美しい花々と緑豊かな景色が楽しめま す。

  17. [17]

    銀座:東京の繁華街であり、高級店やショッピングモールなどがあります。

  18. [18]

    明治神宮外苑:神社や公園が混在した広大な敷地があり、夏には花火大会も開 催されます。

  19. [19]

    raven towns

    赤坂見附ビルディング:高層ビルであり、展望台からは東京タワーや富士山な どを眺めることができます。 以上のように東京にはたくさんの 素晴らしい場所があります。訪れる際は事前に 調べて計画的に行動することをお 勧めします。また、観光客が多い場所では混雑 する可能性があるため注意しましょう。 41 RWKV-Eagle En una pequeña aldea escondida entre las montañas de Andalucía, donde las calles aún conservaban el eco de antiguas leyendas, vivía un joven llamado Alejandro. Alejandro era un joven de diecisiete a...

  20. [20]

    The first term is 2 −0 = 1

  21. [21]

    The second term is 2 −1 = 2−(0+1) = 2−1

  22. [22]

    The third term is 2 −2 = 2−(1+1) = 2−2

  23. [23]

    The fourth term is 2 −3 = 2−(2+1) = 2−3

  24. [24]

    + (an − 1) +

    Continue this process: Summation formula: Σai = a0 + (a1 − 1) + (a2 − 1) + ... + (an − 1) +

  25. [25]

    + (an−1 − 1) + an

    Continue this process until you reach n terms: Σai = a0 + (a1 − 1) + (a2 − 1) + ... + (an−1 − 1) + an

  26. [26]

    • For n=1: There is 1 term in the series, so Σai will equal a1

    Count how many terms there are in this sum: • For n=0: There are 0 terms in the series, so Σai will equal a0. • For n=1: There is 1 term in the series, so Σai will equal a1. • For n=2: There are 3 terms in the series, so Σai will equal a2 + (a3 - 1). By following these steps for each value of n, we can determine how many terms there are in the series and ...

  27. [27]

    It’ s a perfect blend of sweet, spicy, and savory flavors

    Kung Pao Chicken: This spicy dish features chicken stir-fried with peanuts, veg- etables, and chili peppers. It’ s a perfect blend of sweet, spicy, and savory flavors. 47

  28. [28]

    The sweetness of the sauce balances the spiciness of the chili peppers

    General Tso’ s Chicken: Another iconic Chinese dish, this one features tender chicken in a savory sauce with soy sauce, honey, ginger, and garlic. The sweetness of the sauce balances the spiciness of the chili peppers

  29. [29]

    It’ s a bold dish that will have you hooked!

    Mapo Tofu: A spicy and flavorful Sichuan specialty made with soft tofu in a spicy sauce made with chili bean paste, fermented black beans, and ground meat or shrimp. It’ s a bold dish that will have you hooked!

  30. [30]

    It’ s perfect for a chilly day when you want something warm and filling

    Hot and Sour Soup: A comforting bowl of soup made with sour plum sauce, fish balls, bamboo shoots, and other vegetables. It’ s perfect for a chilly day when you want something warm and filling

  31. [31]

    They’re easy to make and always a crowd-pleaser! I hope these suggestions inspire you to try some authentic Chinese cuisine at home! I want to make something more special

    Dumplings: Steamed or fried dumplings are a classic Chinese dish that can be filled with various ingredients like pork, cabbage, or shrimp. They’re easy to make and always a crowd-pleaser! I hope these suggestions inspire you to try some authentic Chinese cuisine at home! I want to make something more special. What about red braised pork belly? Eagle: Red...

  32. [34]

    Remove the pork belly from the pot and set aside

    Add the pork belly to the pot and cook for about 10 minutes on each side, until browned and crispy. Remove the pork belly from the pot and set aside

  33. [37]

    Once the sauce is thickened, carefully place the pork belly back in the pot skin side down, with a layer of sauce underneath it to help seal in the flavors

  34. [38]

    Cover the pot with a tight-fitting lid and simmer over low heat for about 30 minutes, or until the pork is tender and falls apart easily when tested with a fork

  35. [39]

    Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again

    Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again

  36. [40]

    This will help melt any remaining fat on top of the pork belly

    Pour the reserved sauce over the pork belly in the pot and let it simmer for another 5-10 minutes until it thickens even more. This will help melt any remaining fat on top of the pork belly

  37. [41]

    Serve hot with steamed rice or noodles, garnished with scallions or cilantro if desired! Enjoy your delicious red braised pork belly at home! 48 Tell me the steps to cook. Eagle: Certainly! Here are the detailed steps to cook red braised pork belly at home: Ingredients: • 1.5 kg pork belly (about 1 large or 2 small) • 2 tablespoons cooking oil • 4 cloves ...

  38. [42]

    Preheat your oven to 200°C (400°F)

  39. [43]

    Set aside

    Prepare your pork belly by trimming off any excess fat and season with salt and pepper. Set aside

  40. [44]

    Add the minced garlic and stir for a minute or two until fragrant

    Heat the cooking oil in a large pot or Dutch oven over medium-high heat. Add the minced garlic and stir for a minute or two until fragrant

  41. [45]

    Remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish

    Add the pork belly to the pot and cook for about 10 minutes on each side, until browned and crispy. Remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining fat on top of the pork belly

  42. [46]

    Bring the mixture to a boil over high heat, stirring occasionally to prevent sticking

    In the same pot, add the Shaoxing wine, soy sauce, rice vinegar, honey, cornstarch, Sichuan peppercorns, bay leaves, and a pinch of salt and pepper. Bring the mixture to a boil over high heat, stirring occasionally to prevent sticking

  43. [47]

    The sauce should be thickened slightly

    Reduce the heat to low and simmer for about 10 minutes, stirring occasionally to prevent sticking. The sauce should be thickened slightly

  44. [48]

    Cover with a tight-fitting lid and simmer over low heat for about 30 minutes or until the pork is tender and falls apart easily when tested with a fork

    Once the sauce has thickened slightly again, carefully place the pork belly back in the pot skin side down, with a layer of sauce underneath it to help seal in the flavors. Cover with a tight-fitting lid and simmer over low heat for about 30 minutes or until the pork is tender and falls apart easily when tested with a fork

  45. [49]

    Keep warm while you finish cooking the remaining fat on top of the pork belly in the pot over medium heat until it thickens slightly again

    Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining fat on top of the pork belly in the pot over medium heat until it thickens slightly again. This will help melt any remaining fat on top of the pork belly

  46. [50]

    Here are some tips to keep in mind during the cooking process:

    Serve hot with steamed or fried rice alongside, garnished with scallions or cilantro if desired! Enjoy your delicious red braised pork belly at home! What do I need to be careful with the most during the cooking process? Eagle: Cooking red braised pork belly can be a bit delicate, as the dish is made with several aromatic ingredients and requires careful ...

  47. [51]

    This will help enhance the flavors and ensure even browning

    Seasoning: Make sure to season the pork belly well before cooking, especially the exposed fat and skin side. This will help enhance the flavors and ensure even browning

  48. [52]

    This will help prevent the ingredients from burning or sticking to the pot

    Heating the pot: Allow the pot to come to temperature before adding the pork belly. This will help prevent the ingredients from burning or sticking to the pot

  49. [53]

    Be sure to stir occasionally to prevent sticking

    Slow cooking: Simmering the pork belly for a longer period, around 30-45 min- utes, will help achieve a tender texture without overcooking it. Be sure to stir occasionally to prevent sticking

  50. [54]

    You can adjust the consistency by adding more rice wine or broth as needed

    Thickness of sauce: The sauce should be thick enough to coat the pork belly but not so thick that it overpowers the dish. You can adjust the consistency by adding more rice wine or broth as needed. 49

  51. [55]

    By keeping these tips in mind, you can ensure that your red braised pork belly turns out delicious every time! Tell me more about the proper heat and timing

    Serve warm: Red braised pork belly is typically served hot, so it’ s best to eat it right away or reheat it gently in a low oven before serving. By keeping these tips in mind, you can ensure that your red braised pork belly turns out delicious every time! Tell me more about the proper heat and timing. Eagle: Cooking red braised pork belly properly involve...

  52. [56]

    Preheat your oven to 200 °C (400°F) if you plan to use it for the cooking process

    Heating the pot: Make sure your stovetop is clean and free from any grease or debris. Preheat your oven to 200 °C (400°F) if you plan to use it for the cooking process. If not, preheat a dry skillet over medium heat

  53. [57]

    This will help distribute the seasonings evenly and prevent sticking during cooking

    Seasoning: Allow the pork belly to come to room temperature before seasoning with salt and pepper. This will help distribute the seasonings evenly and prevent sticking during cooking

  54. [58]

    If needed, adjust the heat to maintain a steady simmer

    Cooking in the skillet: Place the seasoned pork belly in the preheated skillet and cook over medium heat for about 10 minutes, stirring occasionally to prevent sticking and ensure even browning. If needed, adjust the heat to maintain a steady simmer

  55. [59]

    Be careful when transferring the pork belly to avoid spills or burns

    Transferring to the pot: Once the pork belly is browned on one side, remove it from the skillet using a slotted spoon and transfer it to a large pot (preferably one with a tight-fitting lid). Be careful when transferring the pork belly to avoid spills or burns

  56. [60]

    Stir well to combine all ingredients

    Cooking with Shaoxing wine: Pour in 2-3 tablespoons of Shaoxing wine into the pot, along with 2 tablespoons of soy sauce, 2 tablespoons of rice vinegar, 1 table- spoon of honey, 1 tablespoon of cornstarch, and a pinch of Sichuan peppercorns (or dried Sichuan peppercorns). Stir well to combine all ingredients. If your sauce has thickened too much, you can ...

  57. [61]

    Simmering with bay leaves: Add 1-2 bay leaves to the pot and bring it to a simmer over low heat for about 10-15 minutes, stirring occasionally to prevent sticking or scorching

  58. [62]

    If necessary, you can add more liquid during cooking as needed until it reaches your desired consistency

    Cooking for an hour or so: Allow the pork belly to simmer undisturbed in the sauce for about 30-45 minutes, or until tender and falling apart easily when tested with a fork. If necessary, you can add more liquid during cooking as needed until it reaches your desired consistency

  59. [63]

    fruit" includes

    Finishing with sauce and serving: Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again. This will help enhance the flavors and ensure even browning on top of the pork belly. Serve hot with steamed ric...