Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4

Brian Caruso; Bruce Miller; Deyan Ginev; Jacob Weiskoff; Jeff Sank

arxiv: 2605.16562 · v1 · pith:ZNTBEPEWnew · submitted 2026-05-15 · 💻 cs.CL · cs.DL

Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4

Deyan Ginev , Brian Caruso , Bruce Miller , Jeff Sank , Jacob Weiskoff This is my paper

Pith reviewed 2026-05-20 17:49 UTC · model grok-4.3

classification 💻 cs.CL cs.DL

keywords HTML conversionMathML 4mathematical accessibilityLaTeX processingspeech outputconversion fidelityRust implementationuser feedback

0 comments

The pith

Converting LaTeX mathematics papers to HTML is advancing with MathML 4 annotations to improve accessibility and speech output.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper outlines progress in developing HTML versions of LaTeX-based mathematics papers to enhance accessibility. It notes that community feedback has helped resolve about half of six thousand user reports on fidelity and service issues. Efforts are underway to boost error-free conversions from seventy-five to ninety percent across the corpus. Initial use of MathML 4 Intent annotations supports better speech rendering of math expressions. An ongoing rewrite in Rust aims to cut computing expenses and speed up submission previews. A reader would care if these changes make technical papers easier to access for everyone, particularly those using screen readers.

Core claim

The paper's core claim is that the HTML conversion service for TeX and LaTeX submissions has matured through community input and technical upgrades, incorporating early MathML 4 features for accessible speech while targeting a ninety percent error-free rate and benefiting from a Rust reimplementation of the underlying converter.

What carries the argument

MathML 4 Intent annotations that encode semantic information in mathematical expressions to guide accessible speech generation.

If this is right

Resolving user reports leads to higher fidelity in HTML renderings of complex formulas.
MathML 4 annotations enable more natural speech output from mathematical content.
The Rust port reduces the computational resources needed for conversions.
Faster previews become possible during the submission process.
Reaching the ninety percent target supports providing HTML for nearly all new papers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar annotation techniques could apply to other scientific document formats beyond mathematics.
Integration with emerging AI methods might further automate error detection in conversions.
Tracking adoption rates of the HTML versions would test the practical impact of these changes.
Lower costs could eventually extend the service to converting historical papers as well.

Load-bearing premise

That the sample of user reports and current conversion statistics captures the main difficulties in the full set of submitted documents without major overlooked problems.

What would settle it

A survey of a random sample of papers revealing persistent conversion failures in over ten percent of cases even after the described improvements would disprove the goal of ninety percent error-free HTML.

read the original abstract

We report on the ongoing development of arXiv's HTML Papers offering, available on every new TeX/LaTeX submission since its initial release in 2023. The main highlights from 2025 and early 2026 are: (i) community-driven improvements to HTML fidelity and service health, with roughly half of 6,000 user reports resolved; (ii) corpus-scale conversion work aimed at 90% error-free HTML (currently 75%); (iii) initial MathML 4 Intent annotations for accessible speech output; (iv) an in-progress Rust port of LaTeXML, reducing compute costs and enabling faster previews on submission. The arXiv HTML Papers project remains experimental, but is gradually maturing as we better understand the needs of arXiv's readers and the technical opportunities presented by new standards and by advances in programming languages and AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a status report on arXiv's HTML conversion project with new steps on MathML 4 and a Rust port, but the progress metrics lack supporting details.

read the letter

This paper is a status report on arXiv's HTML conversion project with new steps on MathML 4 and a Rust port, but the progress metrics lack supporting details. The main thing to know is that it documents engineering work to make math papers more accessible on the web rather than presenting new scientific results or tests of a hypothesis. They've started adding MathML 4 intent annotations to support better speech output for screen readers, and they're porting LaTeXML to Rust to cut compute costs and speed up previews on submission. These are practical next moves from the 2023 launch of the HTML service. The report also notes that community input has helped resolve about half of 6000 user reports on fidelity and service health, with current conversions at 75% error-free and a goal of 90%. That shows steady iteration based on real usage. The project does well at highlighting concrete milestones in accessibility and infrastructure. Responding to thousands of user reports and targeting higher success rates at corpus scale is useful work for open science. The soft spots are in the thin evidence for those numbers. The abstract states the percentages and resolved reports but gives no methodology, no breakdown of error types, and no discussion of what might block the 90% target in the LaTeX corpus. Without that, it's hard to judge how solid the claims are or what the remaining failure modes look like. This kind of update is for readers who follow digital publishing, web standards for math, or accessibility in STEM. Someone working on document conversion tools or repository infrastructure would find the current state and plans relevant. It deserves a serious referee because the underlying effort addresses a real access problem with measurable steps forward. I recommend sending it to peer review so referees can comment on the technical choices and push for clearer reporting on the metrics.

Referee Report

1 major / 3 minor

Summary. The manuscript is a status report on the arXiv HTML Papers project, available for new TeX/LaTeX submissions since 2023. It summarizes key developments from 2025 and early 2026: community-driven improvements to HTML fidelity and service health resolving roughly half of 6,000 user reports; corpus-scale conversion work targeting 90% error-free HTML (currently at 75%); initial MathML 4 Intent annotations for accessible speech output; and an in-progress Rust port of LaTeXML to reduce compute costs and enable faster previews. The project is described as experimental but gradually maturing.

Significance. If the reported metrics and developments hold, this work advances accessibility for mathematical content on arXiv by improving HTML rendering and incorporating MathML 4 for better support of assistive technologies such as speech output. The integration of community feedback with modernization of conversion pipelines could serve as a practical model for large-scale scholarly repositories handling complex technical documents.

major comments (1)

[Abstract] Abstract, highlight (ii): The claim of progressing from 75% to a target of 90% error-free HTML is presented without any definition of 'error-free', details on the sampling or measurement methodology, or error analysis. This is load-bearing for assessing whether the corpus-scale conversion effort is on track, as noted in the abstract's quantitative highlights.

minor comments (3)

[Highlights (i)] Highlights (i): Clarify the categories of issues covered by the 6,000 user reports and the criteria used to determine that roughly half have been resolved.
The manuscript would benefit from a brief discussion of remaining technical challenges or limitations in the LaTeX corpus that could affect scaling to 90% error-free conversions.
Add references to prior work on LaTeXML, MathML standards, or previous arXiv accessibility efforts to better contextualize the reported advances.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our status report on the arXiv HTML Papers project. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract, highlight (ii): The claim of progressing from 75% to a target of 90% error-free HTML is presented without any definition of 'error-free', details on the sampling or measurement methodology, or error analysis. This is load-bearing for assessing whether the corpus-scale conversion effort is on track, as noted in the abstract's quantitative highlights.

Authors: We agree that the abstract would be strengthened by additional context on these metrics. The current version presents the figures at a high level without defining 'error-free' or outlining the evaluation approach. In the revised manuscript we will expand highlight (ii) to state that 'error-free' denotes conversions passing our automated validation suite for rendering and accessibility integrity, note that the 75% rate derives from a stratified sample of recent submissions, and direct readers to the error analysis section in the body of the paper. This change will make the quantitative claim more transparent while preserving the abstract's brevity. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This paper is a factual status report summarizing engineering progress on arXiv's HTML conversion pipeline, user report resolutions, conversion metrics, MathML annotations, and a Rust port of LaTeXML. It contains no derivations, equations, fitted parameters, quantitative predictions, or load-bearing self-citations. All figures are presented as direct project observations and milestones rather than results derived from the paper's own inputs or prior claims by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The report rests on the domain assumption that LaTeXML remains a viable base for high-fidelity LaTeX-to-HTML conversion and that community feedback accurately reflects user needs.

axioms (1)

domain assumption LaTeXML can be extended and ported to support MathML 4 and improved fidelity at scale
The entire HTML conversion effort and the planned Rust port depend on the continued viability and extensibility of LaTeXML.

pith-pipeline@v0.9.0 · 5691 in / 1460 out tokens · 66077 ms · 2026-05-20T17:49:12.173874+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We report on the ongoing development of arXiv’s HTML Papers offering... community-driven improvements to HTML fidelity... corpus-scale conversion work aimed at 90% error-free HTML... initial MathML 4 Intent annotations... Rust port of LaTeXML
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MathML 4 Intent annotations for accessible speech output

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 1 internal anchor

[1]

arXiv HTML feedback tracker.https://github.com/arXiv/html_feedback, ac- cessed April 17, 2026

work page 2026
[2]

arXiv monthly submissions.https://arxiv.org/stats/monthly_submissions, ac- cessed April 17, 2026

work page 2026
[3]

ar5iv: arXiv articles as responsive web pages.https://blog.arxiv.org/2022/02/ 21/arxiv-articles-as-responsive-web-pages/(2022), accessed April 17, 2026

work page 2022
[4]

Igalia brings MathML back to Chromium.https://www.igalia.com/2023/01/10/ Igalia-Brings-MathML-Back-to-Chromium.html(2023), accessed April 17, 2026

work page 2023
[5]

arXiv is becoming an independent nonprofit.https://blog.arxiv.org/2026/04/ 02/arxiv-is-becoming-an-independent-nonprofit/ (2026), accessed April 17, 2026

work page 2026
[6]

arxiv.org/help/submit_latex_best_practices.html, accessed April 17, 2026

arXiv: LaTeX markup best practices for successful HTML papers.https://info. arxiv.org/help/submit_latex_best_practices.html, accessed April 17, 2026

work page 2026
[7]

arXiv:2402.08954 (2024)

Frankston, C., Godfrey, J.R., Brinn, S., Hofer, A., Nazzaro, M.: HTML papers on arXiv: why it is important, and how we made it happen. arXiv:2402.08954 (2024). https://doi.org/10.48550/arXiv.2402.08954

work page doi:10.48550/arxiv.2402.08954 2024
[8]

National Institute of Standards and Technology

Miller, B.R., et al.: LaTeXML [software]. National Institute of Standards and Technology. https://math.nist.gov/~BMiller/LaTeXML/, accessed April 17, 2026

work page 2026
[9]

MathML and other XML Technologies for Accessible PDF from LATEX

Mittelbach, F., Fischer, U., Carlisle, D., Wright, J.: MathML and other XML Technologies for Accessible PDF from LaTeX. In: Proceedings of the 2025 ACM Symposium on Document Engineering (DocEng ’25). pp. 1–4 (2025).https://doi. org/10.1145/3704268.3748669

work page doi:10.1145/3704268.3748669 2025
[10]

Accessibility for the Working Mathematician

Ross, J.: Accessibility for the working mathematician. arXiv:2505.22667 (2025). https://doi.org/10.48550/arXiv.2505.22667

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.22667 2025
[11]

W3C candidate recommendation, W3C (2025),https://www.w3.org/TR/mathml-core/

W3C Math Working Group: MathML Core. W3C candidate recommendation, W3C (2025),https://www.w3.org/TR/mathml-core/

work page 2025
[12]

W3C candidate recommendation draft, W3C (2026),https://www.w3.org/TR/mathml4/#mixing_intent

W3C Math Working Group: Mathematical Markup Language (MathML) Version 4.0: Mixing intent with mathml. W3C candidate recommendation draft, W3C (2026),https://www.w3.org/TR/mathml4/#mixing_intent

work page 2026
[13]

W3C group note, W3C (2026),https://www.w3.org/TR/design-principles/

W3C Technical Architecture Group: Web Platform Design Principles. W3C group note, W3C (2026),https://www.w3.org/TR/design-principles/

work page 2026

[1] [1]

arXiv HTML feedback tracker.https://github.com/arXiv/html_feedback, ac- cessed April 17, 2026

work page 2026

[2] [2]

arXiv monthly submissions.https://arxiv.org/stats/monthly_submissions, ac- cessed April 17, 2026

work page 2026

[3] [3]

ar5iv: arXiv articles as responsive web pages.https://blog.arxiv.org/2022/02/ 21/arxiv-articles-as-responsive-web-pages/(2022), accessed April 17, 2026

work page 2022

[4] [4]

Igalia brings MathML back to Chromium.https://www.igalia.com/2023/01/10/ Igalia-Brings-MathML-Back-to-Chromium.html(2023), accessed April 17, 2026

work page 2023

[5] [5]

arXiv is becoming an independent nonprofit.https://blog.arxiv.org/2026/04/ 02/arxiv-is-becoming-an-independent-nonprofit/ (2026), accessed April 17, 2026

work page 2026

[6] [6]

arxiv.org/help/submit_latex_best_practices.html, accessed April 17, 2026

arXiv: LaTeX markup best practices for successful HTML papers.https://info. arxiv.org/help/submit_latex_best_practices.html, accessed April 17, 2026

work page 2026

[7] [7]

arXiv:2402.08954 (2024)

Frankston, C., Godfrey, J.R., Brinn, S., Hofer, A., Nazzaro, M.: HTML papers on arXiv: why it is important, and how we made it happen. arXiv:2402.08954 (2024). https://doi.org/10.48550/arXiv.2402.08954

work page doi:10.48550/arxiv.2402.08954 2024

[8] [8]

National Institute of Standards and Technology

Miller, B.R., et al.: LaTeXML [software]. National Institute of Standards and Technology. https://math.nist.gov/~BMiller/LaTeXML/, accessed April 17, 2026

work page 2026

[9] [9]

MathML and other XML Technologies for Accessible PDF from LATEX

Mittelbach, F., Fischer, U., Carlisle, D., Wright, J.: MathML and other XML Technologies for Accessible PDF from LaTeX. In: Proceedings of the 2025 ACM Symposium on Document Engineering (DocEng ’25). pp. 1–4 (2025).https://doi. org/10.1145/3704268.3748669

work page doi:10.1145/3704268.3748669 2025

[10] [10]

Accessibility for the Working Mathematician

Ross, J.: Accessibility for the working mathematician. arXiv:2505.22667 (2025). https://doi.org/10.48550/arXiv.2505.22667

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.22667 2025

[11] [11]

W3C candidate recommendation, W3C (2025),https://www.w3.org/TR/mathml-core/

W3C Math Working Group: MathML Core. W3C candidate recommendation, W3C (2025),https://www.w3.org/TR/mathml-core/

work page 2025

[12] [12]

W3C candidate recommendation draft, W3C (2026),https://www.w3.org/TR/mathml4/#mixing_intent

W3C Math Working Group: Mathematical Markup Language (MathML) Version 4.0: Mixing intent with mathml. W3C candidate recommendation draft, W3C (2026),https://www.w3.org/TR/mathml4/#mixing_intent

work page 2026

[13] [13]

W3C group note, W3C (2026),https://www.w3.org/TR/design-principles/

W3C Technical Architecture Group: Web Platform Design Principles. W3C group note, W3C (2026),https://www.w3.org/TR/design-principles/

work page 2026