arxiv: 2404.05892 · v4 · pith:B7KD4YH3new · submitted 2024-04-08 · 💻 cs.CL · cs.AI

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Bo Peng , Daniel Goldstein , Quentin Anthony , Alon Albalak , Eric Alcaide , Stella Biderman , Eugene Cheah , Xingjian Du

show 22 more authors

Teddy Ferdinan Haowen Hou Przemys{\l}aw Kazienko Kranthi Kiran GV Jan Koco\'n Bart{\l}omiej Koptyra Satyapriya Krishna Ronald McClelland Jr. Jiaju Lin Niklas Muennighoff Fares Obeid Atsushi Saito Guangyu Song Haoqin Tu Cahya Wirawan Stanis{\l}aw Wo\'zniak Ruichong Zhang Bingchen Zhao Qihang Zhao Peng Zhou Jian Zhu Rui-Jie Zhu

This is my paper

Pith reviewed 2026-05-17 23:15 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords RWKVrecurrent neural networkslanguage modelsmatrix-valued statesdynamic recurrencemultilingual corpusefficient inferencesequence modeling

0 comments

The pith

Matrix-valued states and dynamic recurrence let updated RWKV models reach competitive benchmark performance while keeping RNN inference speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Eagle and Finch as the next versions of the RWKV architecture. It replaces scalar states with multi-headed matrix-valued states and adds a dynamic recurrence mechanism to increase what the model can express without changing the core linear-time inference property of recurrent networks. The authors also release a new 1.12-trillion-token multilingual corpus and a fast tokenizer. They train models from 0.46B to 7.5B parameters and report that the resulting systems match or approach the results of other current models across many standard language benchmarks.

Core claim

Eagle (RWKV-5) and Finch (RWKV-6) achieve competitive performance on language benchmarks by replacing the original RWKV scalar states with multi-headed matrix-valued states and introducing a dynamic recurrence mechanism that improves expressivity while preserving the inference efficiency of RNNs.

What carries the argument

Multi-headed matrix-valued states combined with a dynamic recurrence mechanism that replaces the fixed scalar recurrence of earlier RWKV versions.

If this is right

The architecture remains linear in sequence length at inference time, so longer contexts become practical without quadratic cost.
The new state representation may allow scaling to larger parameter counts while retaining the memory and speed advantages of RNN-style models.
Multilingual capabilities improve through the dedicated large corpus and tokenizer, potentially reducing the need for separate language-specific models.
Open release of models, training code, and inference code enables direct replication and further architectural experiments by others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the dynamic recurrence generalizes, similar state expansions could be tested in other recurrent architectures to close the remaining gap with attention-based models.
The matrix state design might change how scaling laws apply, because each token now carries richer internal information than scalar states allowed.
These changes could extend usefully to non-text sequences such as audio or time-series data where maintaining long-range dependencies with low compute is valuable.

Load-bearing premise

The observed benchmark gains come primarily from the matrix-valued states and dynamic recurrence rather than from the scale of the new training corpus or tokenizer changes.

What would settle it

Train an RWKV-4 model on the identical 1.12-trillion-token corpus and tokenizer, then compare its benchmark scores directly to the Eagle and Finch models of similar size.

read the original abstract

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Eagle (RWKV-5) and Finch (RWKV-6) as advancements over the RWKV-4 architecture, featuring multi-headed matrix-valued states and a dynamic recurrence mechanism intended to increase expressivity while preserving linear-time RNN inference. It also introduces a new 1.12-trillion-token multilingual corpus and a greedy tokenizer. The authors train four Eagle models (0.46B to 7.5B parameters) and two Finch models (1.6B and 3.1B parameters) from scratch and report competitive results across benchmarks, with all models, training code, and inference code released publicly under Apache 2.0.

Significance. If the matrix-valued states and dynamic recurrence can be shown to drive the gains independently of the new corpus and tokenizer, the work would advance efficient sequence modeling by offering a scalable RNN alternative to transformers. The open release of models and code is a clear strength that aids reproducibility. The large multilingual corpus is a useful contribution. Significance is reduced, however, because the empirical results do not isolate the architectural changes from the new training distribution.

major comments (2)

[Abstract and Experimental Results] The central empirical claim—that multi-headed matrix-valued states and dynamic recurrence produce competitive performance—is not supported by any ablation that holds the 1.12 T token corpus, greedy tokenizer, and optimization schedule fixed while reverting only the state representation and recurrence rule to RWKV-4. All reported models were trained from scratch on the new data, so benchmark numbers cannot be causally attributed to the proposed architecture rather than to scale or quality of the training distribution.
[Abstract and Results] The abstract asserts that the models 'achieve competitive performance across a wide variety of benchmarks' yet supplies no tables, no direct comparisons to RWKV-4 or other models at matched parameter counts, no error bars, and no mention of multiple random seeds. This absence makes it impossible to evaluate the reliability or magnitude of the claimed improvements.

minor comments (2)

[Introduction] Clarify the precise architectural distinctions between Eagle and Finch beyond the shared matrix-state and dynamic-recurrence features; the abstract introduces both but does not delineate their individual design choices.
Add a short related-work paragraph situating the dynamic recurrence against prior linear-attention or state-space models to help readers assess novelty.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, clarifying our experimental design and outlining planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract and Experimental Results] The central empirical claim—that multi-headed matrix-valued states and dynamic recurrence produce competitive performance—is not supported by any ablation that holds the 1.12 T token corpus, greedy tokenizer, and optimization schedule fixed while reverting only the state representation and recurrence rule to RWKV-4. All reported models were trained from scratch on the new data, so benchmark numbers cannot be causally attributed to the proposed architecture rather than to scale or quality of the training distribution.

Authors: We agree that an ablation isolating the architectural modifications from the new corpus and tokenizer would strengthen causal attribution. Retraining RWKV-4 from scratch on the full 1.12-trillion-token corpus under identical conditions is computationally prohibitive given current resources. In the revised manuscript we will add an explicit limitations paragraph discussing this constraint, clarify the independent contributions of the new multilingual corpus and greedy tokenizer, and include side-by-side comparisons against previously published RWKV-4 checkpoints wherever parameter counts and evaluation protocols align. The public release of all models, training code, and inference code enables the community to conduct further controlled experiments. revision: partial
Referee: [Abstract and Results] The abstract asserts that the models 'achieve competitive performance across a wide variety of benchmarks' yet supplies no tables, no direct comparisons to RWKV-4 or other models at matched parameter counts, no error bars, and no mention of multiple random seeds. This absence makes it impossible to evaluate the reliability or magnitude of the claimed improvements.

Authors: The abstract is intentionally concise; the full manuscript contains tables with direct comparisons to RWKV-4 and other models at matched scales, together with benchmark details, in the Experiments section. We will revise the abstract to reference these results more explicitly. Large-scale pretraining runs are customarily reported as single executions due to cost; we will add a clarifying statement in the revised text noting the single-run nature of the results and will include error bars for any smaller-scale ablations that become feasible. revision: yes

standing simulated objections not resolved

A full ablation study that retrains RWKV-4 on the identical 1.12-trillion-token corpus and tokenizer while reverting only the state and recurrence components, which remains infeasible due to training cost.

Circularity Check

0 steps flagged

No significant circularity; empirical training results are self-contained

full rationale

The paper introduces architectural changes (matrix-valued states, dynamic recurrence) to the RWKV model, describes a new training corpus and tokenizer, then reports benchmark scores obtained by training the new models from scratch. No derivation chain is present that reduces a claimed prediction or first-principles result to its own inputs by construction, self-definition, or a load-bearing self-citation. Performance numbers are produced by direct optimization and evaluation rather than by fitting a parameter and relabeling the fit as a prediction; the central claims therefore remain independent of the circularity patterns enumerated in the analysis criteria.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning training assumptions plus the domain assumption that the new state and recurrence mechanisms increase expressivity without harming efficiency. No new physical entities are postulated. Free parameters include chosen model sizes and the scale of the newly assembled corpus.

free parameters (2)

model parameter counts
0.46B–7.5B for Eagle and 1.6B/3.1B for Finch; chosen to span practical scales.
training corpus size
1.12 trillion tokens assembled for this work.

axioms (1)

domain assumption Matrix-valued multi-headed states and dynamic recurrence improve expressivity while preserving RNN inference characteristics.
Invoked in the architectural design section of the abstract.

pith-pipeline@v0.9.0 · 5634 in / 1459 out tokens · 41955 ms · 2026-05-17T23:15:46.657659+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs.
Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality.
Foundation.LawOfExistence law_of_existence unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Test-Time Training with KV Binding Is Secretly Linear Attention
cs.LG 2026-02 conditional novelty 8.0

Test-time training with KV binding reduces to learned linear attention.
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
cs.LG 2024-07 conditional novelty 8.0

TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
cs.LG 2024-05 unverdicted novelty 7.0

Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.
The Impossibility Triangle of Long-Context Modeling
cs.CL 2026-05 unverdicted novelty 6.0

No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
cs.SE 2026-04 unverdicted novelty 6.0

Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
Learning to Adapt: In-Context Learning Beyond Stationarity
cs.LG 2026-04 unverdicted novelty 6.0

Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
Optimal Decay Spectra for Linear Recurrences
cs.LG 2026-04 unverdicted novelty 6.0

PoST reparameterizes decay spectra in linear recurrences with geometric log-spacing and position-adaptive scaling to achieve O(exp(-cN/log t)) decay, improving zero-shot language modeling and long-context retrieval ac...
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
cs.CL 2026-04 unverdicted novelty 6.0

PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
Generative Event Pretraining with Foundation Model Alignment
cs.CV 2026-03 unverdicted novelty 6.0

GEP transfers semantic knowledge from image foundation models to event data via alignment and generative pretraining on mixed sequences to create transferable event-based visual models.
Higher-order Linear Attention
cs.LG 2025-10 unverdicted novelty 6.0

Higher-order Linear Attention realizes second-order and higher interactions in linear-time causal attention via constant-size state and associative scans.
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
cs.CL 2025-06 unverdicted novelty 6.0

MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training...
MoBA: Mixture of Block Attention for Long-Context LLMs
cs.LG 2025-02 unverdicted novelty 6.0

MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
Titans: Learning to Memorize at Test Time
cs.LG 2024-12 unverdicted novelty 6.0

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
DataComp-LM: In search of the next generation of training sets for language models
cs.LG 2024-06 unverdicted novelty 6.0

DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
Lessons from the Trenches on Reproducible Evaluation of Language Models
cs.CL 2024-05 accept novelty 6.0

The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
Gated Linear Attention Transformers with Hardware-Efficient Training
cs.LG 2023-12 unverdicted novelty 6.0

Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention
cs.LG 2026-05 unverdicted novelty 5.0

MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
Belief-State RWKV for Reinforcement Learning under Partial Observability
cs.LG 2026-04 unverdicted novelty 5.0

Belief-state RWKV maintains an uncertainty-aware recurrent state for RL policies in partial observability and shows modest gains over standard recurrent baselines in a pilot with observation noise.
StateX: Enhancing RNN Recall via Post-training State Expansion
cs.CL 2025-09 unverdicted novelty 5.0

StateX post-trains RNNs to expand recurrent state size, improving recall and in-context learning with negligible parameter growth.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 19 Pith papers · 2 internal anchors

[1]

Generating Long Sequences with Sparse Transformers

Association for Computational Linguistics. doi: 10.18653/v1/2022.bigscience-1.9. URL https://aclanthology.org/2022.bigscience-1.9. Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990. Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, an...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2022.bigscience-1.9 2022
[2]

Teddy Ferdinan, Jan Koco ´ n, and Przemysław Kazienko

https://transformer-circuits.pub/2021/framework/index.html. Teddy Ferdinan, Jan Koco ´ n, and Przemysław Kazienko. Into the unknown: Self-learning large language models, 2024. Logan Ford, Hao Tang, François Grondin, and James R Glass. A deep residual network for large- scale acoustic scene analysis. In InterSpeech, pp. 2568–2572, 2019. Daniel Y Fu, Tri Da...

work page doi:10.18653/v1/2022.findings-naacl.55 2021
[3]

Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, and Yiran Zhong

URL https://openreview.net/forum?id=P1TCHxJwLB. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, and Yiran Zhong. Transnormerllm: A faster and better large language model with improved transnormer, 2024. 22 Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, and Jie Tang. Block...

work page doi:10.1162/neco.1992.4.1.131 2024
[4]

Group Normalization

URL https://ceur-ws.org/Vol-3528/paper1.pdf. Yuxin Wu and Kaiming He. Group normalization. arXiv:1803.08494, 2018. Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453, 2023. Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Pe...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

The most recent single-timestep input to the Time-mixing module, denoted as xt −1 ∈ RD , useful for the Token Shift. 33 Dataset Domain Wikipediaa Encyclopedia SlimPajama Web peS2o Academia BigPatent Patents Pile of Law Legal, Administra- tive StarCoderb Code OSCAR23.01c Multilingual Web TED2020 Transcripts: TED, TEDx PhilPapers Academia: Philoso- phy NIH-...

work page 2023
[6]

The most recent single-timestep input to the Channel-mixing module, denoted as x′ t −1 ∈ RD , also useful in Token Shift

work page
[7]

This is the core part of the internal state that dominates the most information

WKV head memory: Denoted by wkvt,j ∈ R(D/h)×(D/h), for j = 1, 2,· · ·,h. This is the core part of the internal state that dominates the most information. The total size of the Eagle and Finch internal state is #(State) = L(2D + D2/h) = 66DL (30) 34 Model Name L D State Size Parameters InferFLOPs TrainFLOPs Eagle 0.4B 24 1024 1 622 016 4.62 × 108 9.33 × 10...

work page 2048
[8]

time-first

Logical Reasoning, 9) Reasoning, 10) Chinese. Results Judged by CritiqueLLM (Ke et al., 2023) employs a rule-calibrated, multi-dimensional LLM-as-Judge methodology with Chain-of-Thought explanations, ensuring high interpretability and reliability. Table 14 showcases a consistent improvement in the performance of Eagle and Finch models on the AlignBench be...

work page 2023
[9]

Where indicated, we removed only the data-dependent term from the token-shift LERP in these experiments

dataset at context length 512 and obtained the final validation loss results shown in 19. Where indicated, we removed only the data-dependent term from the token-shift LERP in these experiments. Model Final Validation Loss Finch 2.91 Finch with DDLerp only on decay 2.923 Finch with no DDLerp at all 2.926 Table 19: Ablation Results on DDLerp for 6 layer 76...

work page
[10]

東京タワー：日本最高の観光スポットの一つで、都心から見ることができます。

work page
[11]

皇居：日本の天皇が住む宮殿であり、大正時代に建設された美しい庭園があります。

work page
[12]

浅草寺：江戸時代に建てられた古い寺院で、観光客が多く訪れます。

work page
[13]

上野公園：日本最大級の公園で、美術館や動物園などがあることで有名です。

work page
[14]

築地市場：日本最大級の魚市場であり、新鮮な魚介類を購入することができます。

work page
[15]

渋谷スクランブル交差点：東京都心にある交差点で、人気のある観光スポットです。

work page
[16]

新宿御苑：江戸時代に造られた庭園で、美しい花々と緑豊かな景色が楽しめます。

work page
[17]

銀座：東京の繁華街であり、高級店やショッピングモールなどがあります。

work page
[18]

明治神宮外苑：神社や公園が混在した広大な敷地があり、夏には花火大会も開催されます。

work page
[19]

raven towns

赤坂見附ビルディング：高層ビルであり、展望台からは東京タワーや富士山などを眺めることができます。以上のように東京にはたくさんの素晴らしい場所があります。訪れる際は事前に調べて計画的に行動することをお勧めします。また、観光客が多い場所では混雑する可能性があるため注意しましょう。 41 RWKV-Eagle En una pequeña aldea escondida entre las montañas de Andalucía, donde las calles aún conservaban el eco de antiguas leyendas, vivía un joven llamado Alejandro. Alejandro era un joven de diecisiete a...

work page 2019
[20]

The first term is 2 −0 = 1

work page
[21]

The second term is 2 −1 = 2−(0+1) = 2−1

work page
[22]

The third term is 2 −2 = 2−(1+1) = 2−2

work page
[23]

The fourth term is 2 −3 = 2−(2+1) = 2−3

work page
[24]

+ (an − 1) +

Continue this process: Summation formula: Σai = a0 + (a1 − 1) + (a2 − 1) + ... + (an − 1) +

work page
[25]

+ (an−1 − 1) + an

Continue this process until you reach n terms: Σai = a0 + (a1 − 1) + (a2 − 1) + ... + (an−1 − 1) + an

work page
[26]

• For n=1: There is 1 term in the series, so Σai will equal a1

Count how many terms there are in this sum: • For n=0: There are 0 terms in the series, so Σai will equal a0. • For n=1: There is 1 term in the series, so Σai will equal a1. • For n=2: There are 3 terms in the series, so Σai will equal a2 + (a3 - 1). By following these steps for each value of n, we can determine how many terms there are in the series and ...

work page
[27]

It’ s a perfect blend of sweet, spicy, and savory flavors

Kung Pao Chicken: This spicy dish features chicken stir-fried with peanuts, veg- etables, and chili peppers. It’ s a perfect blend of sweet, spicy, and savory flavors. 47

work page
[28]

The sweetness of the sauce balances the spiciness of the chili peppers

General Tso’ s Chicken: Another iconic Chinese dish, this one features tender chicken in a savory sauce with soy sauce, honey, ginger, and garlic. The sweetness of the sauce balances the spiciness of the chili peppers

work page
[29]

It’ s a bold dish that will have you hooked!

Mapo Tofu: A spicy and flavorful Sichuan specialty made with soft tofu in a spicy sauce made with chili bean paste, fermented black beans, and ground meat or shrimp. It’ s a bold dish that will have you hooked!

work page
[30]

It’ s perfect for a chilly day when you want something warm and filling

Hot and Sour Soup: A comforting bowl of soup made with sour plum sauce, fish balls, bamboo shoots, and other vegetables. It’ s perfect for a chilly day when you want something warm and filling

work page
[31]

They’re easy to make and always a crowd-pleaser! I hope these suggestions inspire you to try some authentic Chinese cuisine at home! I want to make something more special

Dumplings: Steamed or fried dumplings are a classic Chinese dish that can be filled with various ingredients like pork, cabbage, or shrimp. They’re easy to make and always a crowd-pleaser! I hope these suggestions inspire you to try some authentic Chinese cuisine at home! I want to make something more special. What about red braised pork belly? Eagle: Red...

work page
[34]

Remove the pork belly from the pot and set aside

Add the pork belly to the pot and cook for about 10 minutes on each side, until browned and crispy. Remove the pork belly from the pot and set aside

work page
[37]

Once the sauce is thickened, carefully place the pork belly back in the pot skin side down, with a layer of sauce underneath it to help seal in the flavors

work page
[38]

Cover the pot with a tight-fitting lid and simmer over low heat for about 30 minutes, or until the pork is tender and falls apart easily when tested with a fork

work page
[39]

Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again

Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again

work page
[40]

This will help melt any remaining fat on top of the pork belly

Pour the reserved sauce over the pork belly in the pot and let it simmer for another 5-10 minutes until it thickens even more. This will help melt any remaining fat on top of the pork belly

work page
[41]

Serve hot with steamed rice or noodles, garnished with scallions or cilantro if desired! Enjoy your delicious red braised pork belly at home! 48 Tell me the steps to cook. Eagle: Certainly! Here are the detailed steps to cook red braised pork belly at home: Ingredients: • 1.5 kg pork belly (about 1 large or 2 small) • 2 tablespoons cooking oil • 4 cloves ...

work page
[42]

Preheat your oven to 200°C (400°F)

work page
[43]

Set aside

Prepare your pork belly by trimming off any excess fat and season with salt and pepper. Set aside

work page
[44]

Add the minced garlic and stir for a minute or two until fragrant

Heat the cooking oil in a large pot or Dutch oven over medium-high heat. Add the minced garlic and stir for a minute or two until fragrant

work page
[45]

Remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish

Add the pork belly to the pot and cook for about 10 minutes on each side, until browned and crispy. Remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining fat on top of the pork belly

work page
[46]

Bring the mixture to a boil over high heat, stirring occasionally to prevent sticking

In the same pot, add the Shaoxing wine, soy sauce, rice vinegar, honey, cornstarch, Sichuan peppercorns, bay leaves, and a pinch of salt and pepper. Bring the mixture to a boil over high heat, stirring occasionally to prevent sticking

work page
[47]

The sauce should be thickened slightly

Reduce the heat to low and simmer for about 10 minutes, stirring occasionally to prevent sticking. The sauce should be thickened slightly

work page
[48]

Cover with a tight-fitting lid and simmer over low heat for about 30 minutes or until the pork is tender and falls apart easily when tested with a fork

Once the sauce has thickened slightly again, carefully place the pork belly back in the pot skin side down, with a layer of sauce underneath it to help seal in the flavors. Cover with a tight-fitting lid and simmer over low heat for about 30 minutes or until the pork is tender and falls apart easily when tested with a fork

work page
[49]

Keep warm while you finish cooking the remaining fat on top of the pork belly in the pot over medium heat until it thickens slightly again

Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining fat on top of the pork belly in the pot over medium heat until it thickens slightly again. This will help melt any remaining fat on top of the pork belly

work page
[50]

Here are some tips to keep in mind during the cooking process:

Serve hot with steamed or fried rice alongside, garnished with scallions or cilantro if desired! Enjoy your delicious red braised pork belly at home! What do I need to be careful with the most during the cooking process? Eagle: Cooking red braised pork belly can be a bit delicate, as the dish is made with several aromatic ingredients and requires careful ...

work page
[51]

This will help enhance the flavors and ensure even browning

Seasoning: Make sure to season the pork belly well before cooking, especially the exposed fat and skin side. This will help enhance the flavors and ensure even browning

work page
[52]

This will help prevent the ingredients from burning or sticking to the pot

Heating the pot: Allow the pot to come to temperature before adding the pork belly. This will help prevent the ingredients from burning or sticking to the pot

work page
[53]

Be sure to stir occasionally to prevent sticking

Slow cooking: Simmering the pork belly for a longer period, around 30-45 min- utes, will help achieve a tender texture without overcooking it. Be sure to stir occasionally to prevent sticking

work page
[54]

You can adjust the consistency by adding more rice wine or broth as needed

Thickness of sauce: The sauce should be thick enough to coat the pork belly but not so thick that it overpowers the dish. You can adjust the consistency by adding more rice wine or broth as needed. 49

work page
[55]

By keeping these tips in mind, you can ensure that your red braised pork belly turns out delicious every time! Tell me more about the proper heat and timing

Serve warm: Red braised pork belly is typically served hot, so it’ s best to eat it right away or reheat it gently in a low oven before serving. By keeping these tips in mind, you can ensure that your red braised pork belly turns out delicious every time! Tell me more about the proper heat and timing. Eagle: Cooking red braised pork belly properly involve...

work page
[56]

Preheat your oven to 200 °C (400°F) if you plan to use it for the cooking process

Heating the pot: Make sure your stovetop is clean and free from any grease or debris. Preheat your oven to 200 °C (400°F) if you plan to use it for the cooking process. If not, preheat a dry skillet over medium heat

work page
[57]

This will help distribute the seasonings evenly and prevent sticking during cooking

Seasoning: Allow the pork belly to come to room temperature before seasoning with salt and pepper. This will help distribute the seasonings evenly and prevent sticking during cooking

work page
[58]

If needed, adjust the heat to maintain a steady simmer

Cooking in the skillet: Place the seasoned pork belly in the preheated skillet and cook over medium heat for about 10 minutes, stirring occasionally to prevent sticking and ensure even browning. If needed, adjust the heat to maintain a steady simmer

work page
[59]

Be careful when transferring the pork belly to avoid spills or burns

Transferring to the pot: Once the pork belly is browned on one side, remove it from the skillet using a slotted spoon and transfer it to a large pot (preferably one with a tight-fitting lid). Be careful when transferring the pork belly to avoid spills or burns

work page
[60]

Stir well to combine all ingredients

Cooking with Shaoxing wine: Pour in 2-3 tablespoons of Shaoxing wine into the pot, along with 2 tablespoons of soy sauce, 2 tablespoons of rice vinegar, 1 table- spoon of honey, 1 tablespoon of cornstarch, and a pinch of Sichuan peppercorns (or dried Sichuan peppercorns). Stir well to combine all ingredients. If your sauce has thickened too much, you can ...

work page
[61]

Simmering with bay leaves: Add 1-2 bay leaves to the pot and bring it to a simmer over low heat for about 10-15 minutes, stirring occasionally to prevent sticking or scorching

work page
[62]

If necessary, you can add more liquid during cooking as needed until it reaches your desired consistency

Cooking for an hour or so: Allow the pork belly to simmer undisturbed in the sauce for about 30-45 minutes, or until tender and falling apart easily when tested with a fork. If necessary, you can add more liquid during cooking as needed until it reaches your desired consistency

work page
[63]

fruit" includes

Finishing with sauce and serving: Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again. This will help enhance the flavors and ensure even browning on top of the pork belly. Serve hot with steamed ric...

work page