Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Pith reviewed 2026-05-17 23:15 UTC · model grok-4.3
The pith
Matrix-valued states and dynamic recurrence let updated RWKV models reach competitive benchmark performance while keeping RNN inference speed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Eagle (RWKV-5) and Finch (RWKV-6) achieve competitive performance on language benchmarks by replacing the original RWKV scalar states with multi-headed matrix-valued states and introducing a dynamic recurrence mechanism that improves expressivity while preserving the inference efficiency of RNNs.
What carries the argument
Multi-headed matrix-valued states combined with a dynamic recurrence mechanism that replaces the fixed scalar recurrence of earlier RWKV versions.
If this is right
- The architecture remains linear in sequence length at inference time, so longer contexts become practical without quadratic cost.
- The new state representation may allow scaling to larger parameter counts while retaining the memory and speed advantages of RNN-style models.
- Multilingual capabilities improve through the dedicated large corpus and tokenizer, potentially reducing the need for separate language-specific models.
- Open release of models, training code, and inference code enables direct replication and further architectural experiments by others.
Where Pith is reading between the lines
- If the dynamic recurrence generalizes, similar state expansions could be tested in other recurrent architectures to close the remaining gap with attention-based models.
- The matrix state design might change how scaling laws apply, because each token now carries richer internal information than scalar states allowed.
- These changes could extend usefully to non-text sequences such as audio or time-series data where maintaining long-range dependencies with low compute is valuable.
Load-bearing premise
The observed benchmark gains come primarily from the matrix-valued states and dynamic recurrence rather than from the scale of the new training corpus or tokenizer changes.
What would settle it
Train an RWKV-4 model on the identical 1.12-trillion-token corpus and tokenizer, then compare its benchmark scores directly to the Eagle and Finch models of similar size.
read the original abstract
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Eagle (RWKV-5) and Finch (RWKV-6) as advancements over the RWKV-4 architecture, featuring multi-headed matrix-valued states and a dynamic recurrence mechanism intended to increase expressivity while preserving linear-time RNN inference. It also introduces a new 1.12-trillion-token multilingual corpus and a greedy tokenizer. The authors train four Eagle models (0.46B to 7.5B parameters) and two Finch models (1.6B and 3.1B parameters) from scratch and report competitive results across benchmarks, with all models, training code, and inference code released publicly under Apache 2.0.
Significance. If the matrix-valued states and dynamic recurrence can be shown to drive the gains independently of the new corpus and tokenizer, the work would advance efficient sequence modeling by offering a scalable RNN alternative to transformers. The open release of models and code is a clear strength that aids reproducibility. The large multilingual corpus is a useful contribution. Significance is reduced, however, because the empirical results do not isolate the architectural changes from the new training distribution.
major comments (2)
- [Abstract and Experimental Results] The central empirical claim—that multi-headed matrix-valued states and dynamic recurrence produce competitive performance—is not supported by any ablation that holds the 1.12 T token corpus, greedy tokenizer, and optimization schedule fixed while reverting only the state representation and recurrence rule to RWKV-4. All reported models were trained from scratch on the new data, so benchmark numbers cannot be causally attributed to the proposed architecture rather than to scale or quality of the training distribution.
- [Abstract and Results] The abstract asserts that the models 'achieve competitive performance across a wide variety of benchmarks' yet supplies no tables, no direct comparisons to RWKV-4 or other models at matched parameter counts, no error bars, and no mention of multiple random seeds. This absence makes it impossible to evaluate the reliability or magnitude of the claimed improvements.
minor comments (2)
- [Introduction] Clarify the precise architectural distinctions between Eagle and Finch beyond the shared matrix-state and dynamic-recurrence features; the abstract introduces both but does not delineate their individual design choices.
- Add a short related-work paragraph situating the dynamic recurrence against prior linear-attention or state-space models to help readers assess novelty.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment below, clarifying our experimental design and outlining planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract and Experimental Results] The central empirical claim—that multi-headed matrix-valued states and dynamic recurrence produce competitive performance—is not supported by any ablation that holds the 1.12 T token corpus, greedy tokenizer, and optimization schedule fixed while reverting only the state representation and recurrence rule to RWKV-4. All reported models were trained from scratch on the new data, so benchmark numbers cannot be causally attributed to the proposed architecture rather than to scale or quality of the training distribution.
Authors: We agree that an ablation isolating the architectural modifications from the new corpus and tokenizer would strengthen causal attribution. Retraining RWKV-4 from scratch on the full 1.12-trillion-token corpus under identical conditions is computationally prohibitive given current resources. In the revised manuscript we will add an explicit limitations paragraph discussing this constraint, clarify the independent contributions of the new multilingual corpus and greedy tokenizer, and include side-by-side comparisons against previously published RWKV-4 checkpoints wherever parameter counts and evaluation protocols align. The public release of all models, training code, and inference code enables the community to conduct further controlled experiments. revision: partial
-
Referee: [Abstract and Results] The abstract asserts that the models 'achieve competitive performance across a wide variety of benchmarks' yet supplies no tables, no direct comparisons to RWKV-4 or other models at matched parameter counts, no error bars, and no mention of multiple random seeds. This absence makes it impossible to evaluate the reliability or magnitude of the claimed improvements.
Authors: The abstract is intentionally concise; the full manuscript contains tables with direct comparisons to RWKV-4 and other models at matched scales, together with benchmark details, in the Experiments section. We will revise the abstract to reference these results more explicitly. Large-scale pretraining runs are customarily reported as single executions due to cost; we will add a clarifying statement in the revised text noting the single-run nature of the results and will include error bars for any smaller-scale ablations that become feasible. revision: yes
- A full ablation study that retrains RWKV-4 on the identical 1.12-trillion-token corpus and tokenizer while reverting only the state and recurrence components, which remains infeasible due to training cost.
Circularity Check
No significant circularity; empirical training results are self-contained
full rationale
The paper introduces architectural changes (matrix-valued states, dynamic recurrence) to the RWKV model, describes a new training corpus and tokenizer, then reports benchmark scores obtained by training the new models from scratch. No derivation chain is present that reduces a claimed prediction or first-principles result to its own inputs by construction, self-definition, or a load-bearing self-citation. Performance numbers are produced by direct optimization and evaluation rather than by fitting a parameter and relabeling the fit as a prediction; the central claims therefore remain independent of the circularity patterns enumerated in the analysis criteria.
Axiom & Free-Parameter Ledger
free parameters (2)
- model parameter counts
- training corpus size
axioms (1)
- domain assumption Matrix-valued multi-headed states and dynamic recurrence improve expressivity while preserving RNN inference characteristics.
Lean theorems connected to this paper
-
Cost.FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs.
-
Foundation.HierarchyEmergencehierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality.
-
Foundation.LawOfExistencelaw_of_existence unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 19 Pith papers
-
Test-Time Training with KV Binding Is Secretly Linear Attention
Test-time training with KV binding reduces to learned linear attention.
-
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
-
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.
-
The Impossibility Triangle of Long-Context Modeling
No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.
-
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
-
Learning to Adapt: In-Context Learning Beyond Stationarity
Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
-
Optimal Decay Spectra for Linear Recurrences
PoST reparameterizes decay spectra in linear recurrences with geometric log-spacing and position-adaptive scaling to achieve O(exp(-cN/log t)) decay, improving zero-shot language modeling and long-context retrieval ac...
-
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
-
Generative Event Pretraining with Foundation Model Alignment
GEP transfers semantic knowledge from image foundation models to event data via alignment and generative pretraining on mixed sequences to create transferable event-based visual models.
-
Higher-order Linear Attention
Higher-order Linear Attention realizes second-order and higher interactions in linear-time causal attention via constant-size state and associative scans.
-
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training...
-
MoBA: Mixture of Block Attention for Long-Context LLMs
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
-
Titans: Learning to Memorize at Test Time
Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
-
DataComp-LM: In search of the next generation of training sets for language models
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
-
Lessons from the Trenches on Reproducible Evaluation of Language Models
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
-
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.
-
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
-
Belief-State RWKV for Reinforcement Learning under Partial Observability
Belief-state RWKV maintains an uncertainty-aware recurrent state for RL policies in partial observability and shows modest gains over standard recurrent baselines in a pilot with observation noise.
-
StateX: Enhancing RNN Recall via Post-training State Expansion
StateX post-trains RNNs to expand recurrent state size, improving recall and in-context learning with negligible parameter growth.
Reference graph
Works this paper leans on
-
[1]
Generating Long Sequences with Sparse Transformers
Association for Computational Linguistics. doi: 10.18653/v1/2022.bigscience-1.9. URL https://aclanthology.org/2022.bigscience-1.9. Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990. Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, an...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2022.bigscience-1.9 2022
-
[2]
Teddy Ferdinan, Jan Koco ´ n, and Przemysław Kazienko
https://transformer-circuits.pub/2021/framework/index.html. Teddy Ferdinan, Jan Koco ´ n, and Przemysław Kazienko. Into the unknown: Self-learning large language models, 2024. Logan Ford, Hao Tang, François Grondin, and James R Glass. A deep residual network for large- scale acoustic scene analysis. In InterSpeech, pp. 2568–2572, 2019. Daniel Y Fu, Tri Da...
-
[3]
URL https://openreview.net/forum?id=P1TCHxJwLB. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, and Yiran Zhong. Transnormerllm: A faster and better large language model with improved transnormer, 2024. 22 Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, and Jie Tang. Block...
-
[4]
URL https://ceur-ws.org/Vol-3528/paper1.pdf. Yuxin Wu and Kaiming He. Group normalization. arXiv:1803.08494, 2018. Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453, 2023. Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Pe...
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[5]
The most recent single-timestep input to the Time-mixing module, denoted as xt −1 ∈ RD , useful for the Token Shift. 33 Dataset Domain Wikipediaa Encyclopedia SlimPajama Web peS2o Academia BigPatent Patents Pile of Law Legal, Administra- tive StarCoderb Code OSCAR23.01c Multilingual Web TED2020 Transcripts: TED, TEDx PhilPapers Academia: Philoso- phy NIH-...
work page 2023
-
[6]
The most recent single-timestep input to the Channel-mixing module, denoted as x′ t −1 ∈ RD , also useful in Token Shift
-
[7]
This is the core part of the internal state that dominates the most information
WKV head memory: Denoted by wkvt,j ∈ R(D/h)×(D/h), for j = 1, 2,· · ·,h. This is the core part of the internal state that dominates the most information. The total size of the Eagle and Finch internal state is #(State) = L(2D + D2/h) = 66DL (30) 34 Model Name L D State Size Parameters InferFLOPs TrainFLOPs Eagle 0.4B 24 1024 1 622 016 4.62 × 108 9.33 × 10...
work page 2048
-
[8]
Logical Reasoning, 9) Reasoning, 10) Chinese. Results Judged by CritiqueLLM (Ke et al., 2023) employs a rule-calibrated, multi-dimensional LLM-as-Judge methodology with Chain-of-Thought explanations, ensuring high interpretability and reliability. Table 14 showcases a consistent improvement in the performance of Eagle and Finch models on the AlignBench be...
work page 2023
-
[9]
dataset at context length 512 and obtained the final validation loss results shown in 19. Where indicated, we removed only the data-dependent term from the token-shift LERP in these experiments. Model Final Validation Loss Finch 2.91 Finch with DDLerp only on decay 2.923 Finch with no DDLerp at all 2.926 Table 19: Ablation Results on DDLerp for 6 layer 76...
-
[10]
東京タワー:日本最高の観光スポット の一つで、都心から見ることができま す。
-
[11]
皇居:日本の天皇が住む宮殿であり、大正時代に建設された美しい庭園があり ます。
-
[12]
浅草寺:江戸時代に建てられた古い寺院で、観光客が多く訪れます。
-
[13]
上野公園:日本最大級の公園で、美術館や動物園などがあることで有名です。
-
[14]
築地市場:日本最大級の魚市場であり、新鮮な魚介類を購入することができま す。
-
[15]
渋谷スクランブル交差点:東京都心にある交差点で、人気のある観光スポット です。
-
[16]
新宿御苑:江戸時代に造られた庭園で、美しい花々と緑豊かな景色が楽しめま す。
-
[17]
銀座:東京の繁華街であり、高級店やショッピングモールなどがあります。
-
[18]
明治神宮外苑:神社や公園が混在した広大な敷地があり、夏には花火大会も開 催されます。
-
[19]
赤坂見附ビルディング:高層ビルであり、展望台からは東京タワーや富士山な どを眺めることができます。 以上のように東京にはたくさんの 素晴らしい場所があります。訪れる際は事前に 調べて計画的に行動することをお 勧めします。また、観光客が多い場所では混雑 する可能性があるため注意しましょう。 41 RWKV-Eagle En una pequeña aldea escondida entre las montañas de Andalucía, donde las calles aún conservaban el eco de antiguas leyendas, vivía un joven llamado Alejandro. Alejandro era un joven de diecisiete a...
work page 2019
-
[20]
The first term is 2 −0 = 1
-
[21]
The second term is 2 −1 = 2−(0+1) = 2−1
-
[22]
The third term is 2 −2 = 2−(1+1) = 2−2
-
[23]
The fourth term is 2 −3 = 2−(2+1) = 2−3
-
[24]
Continue this process: Summation formula: Σai = a0 + (a1 − 1) + (a2 − 1) + ... + (an − 1) +
-
[25]
Continue this process until you reach n terms: Σai = a0 + (a1 − 1) + (a2 − 1) + ... + (an−1 − 1) + an
-
[26]
• For n=1: There is 1 term in the series, so Σai will equal a1
Count how many terms there are in this sum: • For n=0: There are 0 terms in the series, so Σai will equal a0. • For n=1: There is 1 term in the series, so Σai will equal a1. • For n=2: There are 3 terms in the series, so Σai will equal a2 + (a3 - 1). By following these steps for each value of n, we can determine how many terms there are in the series and ...
-
[27]
It’ s a perfect blend of sweet, spicy, and savory flavors
Kung Pao Chicken: This spicy dish features chicken stir-fried with peanuts, veg- etables, and chili peppers. It’ s a perfect blend of sweet, spicy, and savory flavors. 47
-
[28]
The sweetness of the sauce balances the spiciness of the chili peppers
General Tso’ s Chicken: Another iconic Chinese dish, this one features tender chicken in a savory sauce with soy sauce, honey, ginger, and garlic. The sweetness of the sauce balances the spiciness of the chili peppers
-
[29]
It’ s a bold dish that will have you hooked!
Mapo Tofu: A spicy and flavorful Sichuan specialty made with soft tofu in a spicy sauce made with chili bean paste, fermented black beans, and ground meat or shrimp. It’ s a bold dish that will have you hooked!
-
[30]
It’ s perfect for a chilly day when you want something warm and filling
Hot and Sour Soup: A comforting bowl of soup made with sour plum sauce, fish balls, bamboo shoots, and other vegetables. It’ s perfect for a chilly day when you want something warm and filling
-
[31]
Dumplings: Steamed or fried dumplings are a classic Chinese dish that can be filled with various ingredients like pork, cabbage, or shrimp. They’re easy to make and always a crowd-pleaser! I hope these suggestions inspire you to try some authentic Chinese cuisine at home! I want to make something more special. What about red braised pork belly? Eagle: Red...
-
[34]
Remove the pork belly from the pot and set aside
Add the pork belly to the pot and cook for about 10 minutes on each side, until browned and crispy. Remove the pork belly from the pot and set aside
-
[37]
Once the sauce is thickened, carefully place the pork belly back in the pot skin side down, with a layer of sauce underneath it to help seal in the flavors
-
[38]
Cover the pot with a tight-fitting lid and simmer over low heat for about 30 minutes, or until the pork is tender and falls apart easily when tested with a fork
-
[39]
Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again
-
[40]
This will help melt any remaining fat on top of the pork belly
Pour the reserved sauce over the pork belly in the pot and let it simmer for another 5-10 minutes until it thickens even more. This will help melt any remaining fat on top of the pork belly
-
[41]
Serve hot with steamed rice or noodles, garnished with scallions or cilantro if desired! Enjoy your delicious red braised pork belly at home! 48 Tell me the steps to cook. Eagle: Certainly! Here are the detailed steps to cook red braised pork belly at home: Ingredients: • 1.5 kg pork belly (about 1 large or 2 small) • 2 tablespoons cooking oil • 4 cloves ...
-
[42]
Preheat your oven to 200°C (400°F)
- [43]
-
[44]
Add the minced garlic and stir for a minute or two until fragrant
Heat the cooking oil in a large pot or Dutch oven over medium-high heat. Add the minced garlic and stir for a minute or two until fragrant
-
[45]
Remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish
Add the pork belly to the pot and cook for about 10 minutes on each side, until browned and crispy. Remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining fat on top of the pork belly
-
[46]
Bring the mixture to a boil over high heat, stirring occasionally to prevent sticking
In the same pot, add the Shaoxing wine, soy sauce, rice vinegar, honey, cornstarch, Sichuan peppercorns, bay leaves, and a pinch of salt and pepper. Bring the mixture to a boil over high heat, stirring occasionally to prevent sticking
-
[47]
The sauce should be thickened slightly
Reduce the heat to low and simmer for about 10 minutes, stirring occasionally to prevent sticking. The sauce should be thickened slightly
-
[48]
Once the sauce has thickened slightly again, carefully place the pork belly back in the pot skin side down, with a layer of sauce underneath it to help seal in the flavors. Cover with a tight-fitting lid and simmer over low heat for about 30 minutes or until the pork is tender and falls apart easily when tested with a fork
-
[49]
Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining fat on top of the pork belly in the pot over medium heat until it thickens slightly again. This will help melt any remaining fat on top of the pork belly
-
[50]
Here are some tips to keep in mind during the cooking process:
Serve hot with steamed or fried rice alongside, garnished with scallions or cilantro if desired! Enjoy your delicious red braised pork belly at home! What do I need to be careful with the most during the cooking process? Eagle: Cooking red braised pork belly can be a bit delicate, as the dish is made with several aromatic ingredients and requires careful ...
-
[51]
This will help enhance the flavors and ensure even browning
Seasoning: Make sure to season the pork belly well before cooking, especially the exposed fat and skin side. This will help enhance the flavors and ensure even browning
-
[52]
This will help prevent the ingredients from burning or sticking to the pot
Heating the pot: Allow the pot to come to temperature before adding the pork belly. This will help prevent the ingredients from burning or sticking to the pot
-
[53]
Be sure to stir occasionally to prevent sticking
Slow cooking: Simmering the pork belly for a longer period, around 30-45 min- utes, will help achieve a tender texture without overcooking it. Be sure to stir occasionally to prevent sticking
-
[54]
You can adjust the consistency by adding more rice wine or broth as needed
Thickness of sauce: The sauce should be thick enough to coat the pork belly but not so thick that it overpowers the dish. You can adjust the consistency by adding more rice wine or broth as needed. 49
-
[55]
Serve warm: Red braised pork belly is typically served hot, so it’ s best to eat it right away or reheat it gently in a low oven before serving. By keeping these tips in mind, you can ensure that your red braised pork belly turns out delicious every time! Tell me more about the proper heat and timing. Eagle: Cooking red braised pork belly properly involve...
-
[56]
Preheat your oven to 200 °C (400°F) if you plan to use it for the cooking process
Heating the pot: Make sure your stovetop is clean and free from any grease or debris. Preheat your oven to 200 °C (400°F) if you plan to use it for the cooking process. If not, preheat a dry skillet over medium heat
-
[57]
This will help distribute the seasonings evenly and prevent sticking during cooking
Seasoning: Allow the pork belly to come to room temperature before seasoning with salt and pepper. This will help distribute the seasonings evenly and prevent sticking during cooking
-
[58]
If needed, adjust the heat to maintain a steady simmer
Cooking in the skillet: Place the seasoned pork belly in the preheated skillet and cook over medium heat for about 10 minutes, stirring occasionally to prevent sticking and ensure even browning. If needed, adjust the heat to maintain a steady simmer
-
[59]
Be careful when transferring the pork belly to avoid spills or burns
Transferring to the pot: Once the pork belly is browned on one side, remove it from the skillet using a slotted spoon and transfer it to a large pot (preferably one with a tight-fitting lid). Be careful when transferring the pork belly to avoid spills or burns
-
[60]
Stir well to combine all ingredients
Cooking with Shaoxing wine: Pour in 2-3 tablespoons of Shaoxing wine into the pot, along with 2 tablespoons of soy sauce, 2 tablespoons of rice vinegar, 1 table- spoon of honey, 1 tablespoon of cornstarch, and a pinch of Sichuan peppercorns (or dried Sichuan peppercorns). Stir well to combine all ingredients. If your sauce has thickened too much, you can ...
-
[61]
Simmering with bay leaves: Add 1-2 bay leaves to the pot and bring it to a simmer over low heat for about 10-15 minutes, stirring occasionally to prevent sticking or scorching
-
[62]
Cooking for an hour or so: Allow the pork belly to simmer undisturbed in the sauce for about 30-45 minutes, or until tender and falling apart easily when tested with a fork. If necessary, you can add more liquid during cooking as needed until it reaches your desired consistency
-
[63]
Finishing with sauce and serving: Once cooked, remove the pork belly from the pot using a slotted spoon and transfer it to a serving dish. Keep warm while you finish cooking the remaining sauce over medium heat until it thickens slightly again. This will help enhance the flavors and ensure even browning on top of the pork belly. Serve hot with steamed ric...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.