arxiv: 2604.04936 · v1 · submitted 2026-01-08 · 💻 cs.IR · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Uday Allu , Sonu Kedia , Tanmay Odapally , Biddwan Ahmed

Authors on Pith no claims yet

Pith reviewed 2026-05-16 16:43 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords chunkingretrieval-augmented generationRAGweb documentsLLM efficiencycost reductiondocument processing

0 comments

The pith

W-RAC chunks web documents for RAG by grouping ID-addressable units with LLMs instead of generating text, matching retrieval quality at 10x lower cost

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Web Retrieval-Aware Chunking, or W-RAC, a framework for preparing web documents in retrieval-augmented generation systems. It parses web content into structured, ID-addressable units and then uses large language models solely to decide how to group those units for optimal retrieval, avoiding any text generation or rewriting. This design targets the high costs and scalability problems of traditional chunking methods like fixed-size or agentic approaches. A reader would care because it promises to make large-scale web ingestion for RAG practical by cutting chunking expenses dramatically while keeping or improving search relevance.

Core claim

W-RAC decouples text extraction from semantic chunk planning by representing parsed web content as structured, ID-addressable units and leveraging large language models only for retrieval-aware grouping decisions rather than text generation. This significantly reduces token usage, eliminates hallucination risks, and improves system observability. Experimental analysis demonstrates that W-RAC achieves comparable or better retrieval performance than traditional chunking approaches while reducing chunking-related LLM costs by an order of magnitude.

What carries the argument

Structured ID-addressable units from parsed web content, which allow LLMs to perform retrieval-aware grouping decisions without generating or rewriting text

If this is right

Retrieval quality remains comparable or superior to fixed-size or rule-based chunking
Chunking costs drop by roughly 10 times, enabling larger document sets
System observability increases because decisions operate on explicit IDs rather than generated text
Scalability improves for web-scale ingestion without redundant processing

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar structuring could apply to other document types beyond web pages if parsers exist
Integration with existing RAG pipelines might require only a new chunker module
Long-term this could lower barriers to deploying RAG on dynamic web content
Future work might test it on multilingual web data or very large sites

Load-bearing premise

That LLM decisions on groupings of ID-addressable structured units can preserve retrieval quality as well as methods that generate or analyze full semantic text

What would settle it

A side-by-side retrieval accuracy test on a large web corpus where W-RAC grouping yields measurably lower relevance scores than agentic chunking on the same queries

read the original abstract

Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully agentic chunking, often suffer from high token consumption, redundant text generation, limited scalability, and poor debuggability, especially for large-scale web content ingestion. In this paper, we propose Web Retrieval-Aware Chunking (W-RAC), a novel, cost-efficient chunking framework designed specifically for web-based documents. W-RAC decouples text extraction from semantic chunk planning by representing parsed web content as structured, ID-addressable units and leveraging large language models (LLMs) only for retrieval-aware grouping decisions rather than text generation. This significantly reduces token usage, eliminates hallucination risks, and improves system observability.Experimental analysis and architectural comparison demonstrate that W-RAC achieves comparable or better retrieval performance than traditional chunking approaches while reducing chunking-related LLM costs by an order of magnitude.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

W-RAC cuts LLM costs in web chunking via ID-addressable units, but its performance claims lack supporting details.

read the letter

This paper introduces Web Retrieval-Aware Chunking, or W-RAC, which represents parsed web content as structured ID-addressable units and restricts LLM use to grouping decisions based on retrieval awareness. The result is a framework that targets lower token usage and better scalability for RAG on large web collections. What is new here is the specific separation of extraction from semantic planning in a web-focused setting. Earlier methods either apply rigid rules, generate full text with LLMs, or rely on agentic processes that consume more resources. W-RAC keeps the LLM role narrow, which cuts costs and removes some hallucination issues while making the system easier to debug. The paper does well at describing the pain points of current chunking for web documents and at outlining an architecture that could address them. The idea of ID-addressable units for efficient grouping is a clear engineering step that practitioners might find straightforward to implement. The main soft spot is the evidence. The text claims experimental analysis shows comparable or better retrieval performance with an order-of-magnitude cost reduction, but it provides no datasets, metrics, baselines, or detailed results. Without those, the performance assertions stay untested. There is also a risk that stripping to ID units loses important cross-references or layout context common in web pages, which could hurt chunk quality even if token counts drop. This paper is for engineers and researchers working on production retrieval systems who need to handle web-scale data without high LLM bills. A reader looking for practical chunking alternatives would find the framework worth examining. I recommend sending it for peer review. The core proposal is solid enough to benefit from referee input on the experiments and any edge cases in web content handling.

Referee Report

2 major / 2 minor

Summary. The paper proposes Web Retrieval-Aware Chunking (W-RAC), a framework for web-based RAG that represents parsed content as structured, ID-addressable units and restricts LLM use to retrieval-aware grouping decisions rather than text generation. This is claimed to reduce token consumption and hallucination risks while achieving comparable or better retrieval performance than fixed-size, rule-based, or agentic chunking at roughly 10x lower LLM cost, supported by experimental analysis and architectural comparison.

Significance. If the performance and cost claims are substantiated, W-RAC would address a practical bottleneck in large-scale web ingestion for RAG by improving efficiency and debuggability without sacrificing retrieval quality. The approach's emphasis on observability and reduced generation is a clear engineering strength, but the manuscript supplies no datasets, metrics, baselines, or quantitative results, so the significance cannot be evaluated from the provided text.

major comments (2)

[Abstract] Abstract: the claim of 'experimental analysis and architectural comparison' demonstrating comparable or better retrieval performance and an order-of-magnitude cost reduction is unsupported; no datasets, evaluation metrics (e.g., recall@K, nDCG), baselines, or error analysis are described anywhere in the manuscript.
[Method] Method section (implied by the architectural description): the premise that ID-addressable structured units plus retrieval-aware prompts suffice for coherent, high-recall chunks is load-bearing for the headline claim, yet the manuscript provides no evidence that this representation preserves implicit cross-references, layout-dependent semantics, or long-range dependencies typical of web pages; if the LLM cannot recover these from stripped metadata, downstream retrieval quality will degrade.

minor comments (2)

Provide concrete examples of the ID-addressable unit schema and the exact retrieval-aware prompts used for grouping decisions.
Clarify how W-RAC handles dynamic web elements (e.g., JavaScript-rendered content) that may not be captured in the initial parsed representation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We acknowledge that the submitted manuscript does not contain quantitative experiments or supporting evidence for the performance claims and have revised the abstract and method sections accordingly to remove overstated assertions while adding clarifying examples and limitations discussion.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'experimental analysis and architectural comparison' demonstrating comparable or better retrieval performance and an order-of-magnitude cost reduction is unsupported; no datasets, evaluation metrics (e.g., recall@K, nDCG), baselines, or error analysis are described anywhere in the manuscript.

Authors: We agree that the manuscript provides no datasets, metrics, baselines, or quantitative results. The abstract's reference to experimental analysis was imprecise and referred only to qualitative architectural reasoning. We have revised the abstract to eliminate all specific performance and cost claims, describing W-RAC instead as a framework whose design goals include reduced token usage and improved observability. A new section outlining planned evaluation metrics (including recall@K and nDCG) and baselines has been added. revision: yes
Referee: [Method] Method section (implied by the architectural description): the premise that ID-addressable structured units plus retrieval-aware prompts suffice for coherent, high-recall chunks is load-bearing for the headline claim, yet the manuscript provides no evidence that this representation preserves implicit cross-references, layout-dependent semantics, or long-range dependencies typical of web pages; if the LLM cannot recover these from stripped metadata, downstream retrieval quality will degrade.

Authors: We accept that the original description lacked concrete evidence or examples for preservation of cross-references and layout semantics. We have expanded the Method section with specific examples showing how ID-addressable units and metadata fields encode layout information and cross-references. A new limitations paragraph has also been added discussing cases where long-range dependencies may not be fully recovered and how prompt design attempts to mitigate this. revision: partial

Circularity Check

0 steps flagged

No circularity in W-RAC architectural proposal

full rationale

The paper proposes W-RAC as a framework that represents parsed web content as structured ID-addressable units and restricts LLM use to retrieval-aware grouping decisions rather than text generation. No equations, fitted parameters, or derivations appear in the provided text. Central claims rest on the architectural decoupling and experimental comparisons, which are independent of any self-referential loop or input renaming. No self-citations are invoked as load-bearing justification for uniqueness or ansatz choices. This is a standard non-circular proposal of a new method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no explicit free parameters, mathematical axioms, or new postulated entities; the contribution is an engineering framework whose validity rests on unstated experimental validation.

pith-pipeline@v0.9.0 · 5492 in / 984 out tokens · 45692 ms · 2026-05-16T16:43:06.877620+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

W-RAC decouples text extraction from semantic chunk planning by representing parsed web content as structured, ID-addressable units and leveraging large language models (LLMs) only for retrieval-aware grouping decisions rather than text generation.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

W-RAC reduces chunking-related LLM costs by an order of magnitude.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 9 internal anchors

[1]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[2]

A Survey on Multimodal Large Language Models

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. A survey on multimodal large language models.arXiv preprint arXiv:2306.13549, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Visrag: Vision-based retrieval-augmented generation on multi-modal large language models.arXiv preprint arXiv:2410.10117, 2024

Xinyu Chen, Yuhan Wang, Ziliang Zhao, Haotian Wan, and Yong Zhang. Visrag: Vision-based retrieval-augmented generation on multi-modal large language models.arXiv preprint arXiv:2410.10117, 2024. 7

work page arXiv 2024
[4]

Video-rag: Visually-aligned retrieval-augmented long video comprehension.arXiv Preprint, 2024

Yongdong Zhang, Jiaqi Wu, Hao Zhao, Kai Wang, Mingqian Liu, Jun Dong, Jianbo Xu, Yiran Wang, and Fuzheng Shen. Videorag: Visually-aligned retrieval-augmented long video understanding.arXiv preprint arXiv:2411.13093, 2024

work page arXiv 2024
[5]

Layoutlm: Pre-training of text and layout for document image understanding.arXiv preprint arXiv:1912.13318, 2020

Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. Layoutlm: Pre-training of text and layout for document image understanding.arXiv preprint arXiv:1912.13318, 2020

work page arXiv 1912
[6]

Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, et al. Layoutlmv2: Multi-modal pre-training for visually-rich document understanding.Proceed- ings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language ...

work page 2021
[7]

Docvqa: A dataset for vqa on document images

Minesh Mathew, Dimosthenis Karatzas, and CV Jawahar. Docvqa: A dataset for vqa on document images. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2200–2209, 2021

work page 2021
[8]

Dense Passage Retrieval for Open-Domain Question Answering

Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering.arXiv preprint arXiv:2004.04906, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004
[9]

Hybrid retrieval-generation reinforced agent for medical image report generation

Christy Y Li, Xiaodan Liang, Zhiting Hu, and Eric P Xing. Hybrid retrieval-generation reinforced agent for medical image report generation. InAdvances in Neural Information Processing Systems, volume 31, 2018

work page 2018
[10]

Passage Re-ranking with BERT

Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with bert.arXiv preprint arXiv:1901.04085, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[11]

Hotpotqa: A dataset for diverse, explainable multi-hop question answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christo- pher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, 2018

work page 2018
[12]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Sentence-bert: Sentence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. InProceed- ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019

work page 2019
[15]

Elasticsearch: The definitive guide, 2015

Clinton Gormley and Zachary Tong. Elasticsearch: The definitive guide, 2015

work page 2015
[16]

Text and Code Embeddings by Contrastive Pre-Training

Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, et al. Text and code embeddings by contrastive pre-training.arXiv preprint arXiv:2201.10005, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[17]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.arXiv preprint arXiv:2306.05685, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Ragas: Automated Evaluation of Retrieval Augmented Generation

Shahul Es, Jithin James, Luis Espinosa-Anke, and Steven Schockaert. Ragas: Automated evaluation of retrieval augmented generation.arXiv preprint arXiv:2309.15217, 2023

work page internal anchor Pith review arXiv 2023
[19]

The use of mmr, diversity-based reranking for reordering documents and producing summaries

Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. InProceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336, 1998

work page 1998
[20]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[21]

Beyond extraction: Contextualising tabular data for efficient summarisation by language models, 2024

Uday Allu, Biddwan Ahmed, and Vishesh Tripathi. Beyond extraction: Contextualising tabular data for efficient summarisation by language models, 2024

work page 2024
[22]

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks, 2024

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, and Jifeng Dai. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks, 2024

work page 2024
[23]

Nikolaos Livathinos, Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Panos Vagenas, Ce- sar Berrospi Ramis, Matteo Omenetti, Kasper Dinkla, Yusik Kim, Shubham Gupta, Rafael Teixeira de Lima, Valery Weber, Lucas Morin, Ingmar Meijer, Viktor Kuropiatnyk, and Peter W. J. Staar. Docling: An efficient open-source toolkit for ai-driven document conve...

work page 2025
[24]

Vision-guided chunking is all you need: Enhancing rag with multimodal document understanding, 2025

Vishesh Tripathi, Tanmay Odapally, Indraneel Das, Uday Allu, and Biddwan Ahmed. Vision-guided chunking is all you need: Enhancing rag with multimodal document understanding, 2025. 9 A Appendix A.1 W-RAC Prompt Chunk Grouping and Hierarchical Structuring Prompt You are tasked with processing an array of document chunks representing text sections, headings,...

work page 2025
[25]

Three-Level Heading Hierarchy Build a complete heading hierarchy tree by tracing parent_heading relationships upward. Every chunk group must include exactly 3 levels: •Level 1: Top-level/root heading - document title or highest-level heading that encompasses the content’s topic •Level 2: Mid-level parent heading - intermediate heading or reuse Level 1 •Le...

work page
[26]

heading_66

Parent Headings with Multiple Children When a parent heading has multiple child sections,include the parent heading ID in EACH child group array. Never output parent headings as standalone arrays when they have multiple children. Example: ["heading_66", "heading_67", "text_68"] and ["heading_66", "heading_80", "text_81"] (head- ing_66 appears in both)

work page
[27]

Steps to

Procedural Content NEVER split procedural steps, instructions, or sequential numbered/bulleted lists across multiple chunks.When content represents a procedure, process, or step-by-step instructions (e.g. “Steps to...”, numbered steps 1, 2, 3...),group ALL steps together in a SINGLE chunk array, even if they have individual headings or are numbered separa...

work page
[28]

Context & Merging • Use heading hierarchy, parent_heading, and title fields to map structure • If parent_heading is None but structure shows hierarchy, infer parent-child relationships from sequential patterns • For small chunks (≤2 lines) missing context, merge with title/heading/adjacent chunks • Include relevant titles/headings with dependent content •...

work page
[29]

Filtering Remove: cookies, page navigation, logins

work page
[30]

Output Rules • Output only chunk IDs (no text modifications) 10 • Each array must contain at least one heading/title or sufficient context • Merge small contextless fragments—never output standalone arrays for them PROCESSING STEPS

work page
[31]

Use title if context is ambiguous

Map heading hierarchy using parent_heading relationships. Use title if context is ambiguous

work page
[32]

These MUST be grouped together in a single chunk

Identify procedural content: Detect step-by-step instructions, numbered procedures, or sequential processes. These MUST be grouped together in a single chunk

work page
[33]

Fill missing levels with best-matching existing heading ID

For each chunk, trace 3 heading levels (L3→L2→L1). Fill missing levels with best-matching existing heading ID

work page
[34]

Identify parent headings with multiple children—include in ALL child arrays

work page
[35]

Process chunks: merge small/contextless chunks using title/headings; ensure 3-level hierarchy; include parent in child groups;keep all procedural steps together

work page
[36]

Group into logical/topical arrays with 3-level hierarchy

work page
[37]

chunks": [[

Output JSON without backticks and code blocks:{"chunks": [["id1", "id2", "id3"], ...]} EXAMPLES Example 1: Missing Level Input: [ {"id": "heading_1", "type": "heading", "text": "EXCESS BAGGAGE CHARGES", "parent_heading": null}, {"id": "heading_2", "type": "heading", "text": "Packing heavy?", "parent_heading": "EXCESS BAGGAGE CHARGES"}, {"id": "text_3", "t...

work page
[38]

Reading and Understanding Read all markdown content carefully

work page
[39]

Features

Heading Structure Always generate a 2 or 3-level heading structure for every chunk. Keep similar chunks into same headings: •First-level heading: Document or product title •Second-level heading: Major section inside the document (e.g., “Features”, “Amenities”, “Itinerary”) •Third-level heading: Specific subtopic within that section

work page
[40]

All text, hyperlinks, links, formatting, images, image links, and elements must remain exactly as in the original markdown and present in the output chunks

Content Preservation DO NOTalter, paraphrase, shorten, or skip any markdown content. All text, hyperlinks, links, formatting, images, image links, and elements must remain exactly as in the original markdown and present in the output chunks

work page
[41]

Keep similar chunks together in same headings or use just two levels of headings

Chunking Strategy Do not over chunk. Keep similar chunks together in same headings or use just two levels of headings

work page
[42]

Grouping Related Content Keep all related content together: • Always keep full numbered lists, bullet points, and related paragraphs in the same chunk • Never split tables, figures, code blocks, or other complete elements

work page
[43]

OUTPUT REQUIREMENTS Output a list of chunks where each chunk starts with a full 2 or 3-level heading and remove all empty or no-finding chunks

Table Formatting 12 When working with tables: Format using proper markdown table syntax (pipes|and hyphens-). OUTPUT REQUIREMENTS Output a list of chunks where each chunk starts with a full 2 or 3-level heading and remove all empty or no-finding chunks. Use this exact format: [HEAD]main_heading > section_heading > chunk_heading[/HEAD] chunk content 1 [HEA...

work page