Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
Pith reviewed 2026-05-21 02:48 UTC · model grok-4.3
The pith
LlamaWeb is a WebGPU backend for llama.cpp that cuts browser LLM memory use by 29-33 percent while raising decode throughput by 45-69 percent across devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LlamaWeb enables memory-efficient and performance-portable LLM inference in the browser by reducing memory overhead through static memory planning and efficient model loading, addressing cross-device variability through a tunable kernel library, and supporting multiple quantization formats through templated GPU kernels.
What carries the argument
Templated GPU kernels inside a tunable kernel library that together support multiple quantization formats while adapting to different devices and browsers.
If this is right
- LLM inference becomes feasible on a wider range of consumer hardware without custom native code.
- Multiple quantization formats can be supported from a single kernel codebase with minimal added overhead.
- Browser-based applications can keep model weights and computations entirely on the client device.
- Performance remains competitive with vendor-specific llama.cpp backends on some platforms.
Where Pith is reading between the lines
- The same static-planning approach could be applied to other inference engines to reduce browser memory footprints.
- Local browser execution inherently keeps user prompts and outputs off remote servers, improving privacy for interactive AI tools.
- Extending the kernel library to additional low-precision formats would further widen the set of runnable models on memory-limited devices.
Load-bearing premise
The performance and memory measurements collected on the 16 tested devices and four weight formats are representative of typical real-world browser usage patterns and hardware variability.
What would settle it
Running the same models on a device-browser pair outside the original test set and checking whether memory reduction stays inside the 29-33 percent band and decode speedup stays inside the 45-69 percent band.
Figures
read the original abstract
Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the Web (LlamaWeb), a WebGPU backend for llama$.$cpp that enables memory-efficient and performance-portable LLM inference across a wide range of model weight formats in the browser. Our design significantly reduces memory overhead through static memory planning and efficient model loading, addresses cross-device variability through a tunable kernel library, and introduces templated GPU kernels that support performant implementations of numerous quantization formats, enabling broad model support and extensibility to new formats. We evaluate LlamaWeb on 16 devices from 8 vendors, collecting data from 10 language models and four model weight formats. We compare LlamaWeb against existing browser-based LLM frameworks and find that LlamaWeb requires 29-33% less memory across several combinations of device, browser, and operating system. We also evaluate LlamaWeb's performance against these frameworks and find that it increases decode throughput by 45-69% across four GPUs from separate vendors. In addition, we compare LlamaWeb's performance against other llama$.$cpp backends, where it is competitive with and even beats vendor-specific backend performance on some devices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents LlamaWeb, a WebGPU backend for llama.cpp enabling memory-efficient, performance-portable, and multi-precision LLM inference in browsers. It uses static memory planning and efficient model loading to reduce overhead, a tunable kernel library to handle device variability, and templated GPU kernels supporting multiple quantization formats. Evaluations on 16 devices from 8 vendors with 10 models and 4 weight formats report 29-33% lower memory usage versus other browser frameworks and 45-69% higher decode throughput on four GPUs from separate vendors, while remaining competitive with other llama.cpp backends.
Significance. If the reported gains prove robust, this implementation could meaningfully advance browser-based LLM deployment by mitigating memory constraints and hardware heterogeneity while preserving privacy. Notable strengths include the broad multi-vendor device coverage, explicit support for multiple weight formats with extensibility, and direct integration with the established llama.cpp ecosystem, which facilitates reproducibility and adoption.
major comments (1)
- [§5] §5 (Evaluation): The headline claims of 29-33% memory reduction and 45-69% throughput improvement rest on measurements from 16 devices. The section provides no device-selection criteria, run-to-run variance or error bars, and no sensitivity analysis for WebGPU driver differences, shader compilation overhead, or sandbox memory behavior under concurrent tab load. These omissions are load-bearing for assessing whether the quoted percentages generalize beyond the tested sample.
minor comments (2)
- [Abstract] Abstract: The comparison baselines for the memory and throughput numbers are referenced only generically; naming the specific competing browser frameworks and llama.cpp backends in the abstract would improve immediate clarity.
- [§4] §4 (Implementation): The description of the templated kernel library would benefit from a small table listing supported quantization formats and their corresponding kernel variants for quick reference.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important considerations for strengthening the evaluation section. We address the major comment point by point below and outline the revisions we will make.
read point-by-point responses
-
Referee: [§5] §5 (Evaluation): The headline claims of 29-33% memory reduction and 45-69% throughput improvement rest on measurements from 16 devices. The section provides no device-selection criteria, run-to-run variance or error bars, and no sensitivity analysis for WebGPU driver differences, shader compilation overhead, or sandbox memory behavior under concurrent tab load. These omissions are load-bearing for assessing whether the quoted percentages generalize beyond the tested sample.
Authors: We agree that additional methodological details are needed to support the generalizability of the reported gains. In the revised manuscript, we will expand §5 with explicit device-selection criteria, explaining that the 16 devices were chosen to span 8 vendors and a range of performance tiers (from integrated graphics to high-end discrete GPUs) to evaluate portability. We will also add run-to-run variance information and error bars for the memory and throughput metrics, based on repeated measurements collected during our experiments. For sensitivity to WebGPU driver differences, shader compilation overhead, and concurrent-tab memory behavior, we will include a new discussion subsection acknowledging these factors as sources of variability in browser environments and reporting any qualitative observations from our testing across browsers and OSes. These revisions will be made without altering the core results. revision: yes
Circularity Check
No significant circularity; claims rest on direct implementation and empirical benchmarks
full rationale
The paper describes an engineering implementation of a WebGPU backend for llama.cpp, including static memory planning, tunable kernels, and templated quantization support. All headline performance numbers (29-33% memory reduction, 45-69% decode throughput gains) are presented as direct outcomes of measurements collected on 16 devices across 8 vendors, 10 models, and 4 weight formats. No equations, fitted parameters, or uniqueness theorems are invoked; the central claims do not reduce to self-citations or to quantities defined in terms of the reported results. The work is therefore self-contained as an implementation-plus-measurement contribution with external comparisons.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption WebGPU is available and sufficiently stable on the target browsers and devices for the reported kernels to execute.
Reference graph
Works this paper leans on
-
[1]
Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guil- herme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíček, Agustín Pi- queres Lajarín, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Clémentine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
LFM2 technical report.arXiv:2511.23404, 2025
Alexander Amini, Anna Banaszak, Harold Benoit, Arthur Böök, Tarek Dakhran, Song Duong, Alfred Eng, Fernando Fernandes, Marc Härkönen, Anne Harring- ton, Ramin Hasani, Saniya Karwa, Yuri Khrustalev, Maxime Labonne, Math- ias Lechner, Valentine Lechner, Simon Lee, Zetian Li, Noel Loo, Jacob Marks, Edoardo Mosca, Samuel J. Paech, Paul Pak, Rom N. Parnichkun,...
-
[3]
Apple Inc. 2026. Metal. https://developer.apple.com/documentation/metal/
work page 2026
-
[4]
Elie Bakouch, Carlos Miguel Patiño, Anton Lozhkov, Edward Beeching, Aymeric Roucher, Nouamane Tazi, Aksel Joonas Reedi, Guilherme Penedo, Hynek Ky- dlicek, Clémentine Fourrier, Nathan Habib, Kashif Rasul, Quentin Gallouédec, Hugo Larcher, Mathieu Morlon, Joshua Lochner, Vaibhav Srivastav, Xuan-Son Nguyen, Colin Raffel, Lewis Tunstall, Loubna Ben Allal, Le...
work page 2025
-
[5]
Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralid- haran, Yingyan Celine Lin, and Pavlo Molchanov. 2025. Small Language Models are the Future of Agentic AI. arXiv:2506.02153 https://arxiv.org/abs/2506.02153
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Zhiyang Chen, Yun Ma, Haiyang Shen, and Mugeng Liu. 2025. WeInfer: Unleash- ing the Power of WebGPU on LLM Inference in Web Browsers. InProceedings of the ACM on Web Conference 2025. Association for Computing Machinery. https://doi.org/10.1145/3696410.3714553
-
[7]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Training Deep Neural Networks with Low Precision Multiplications. arXiv:1412.7024 https://arxiv.org/abs/1412.7024
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[8]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. InProceedings of the 36th International Conference on Neural Information Process- ing Systems. https://doi.org/10.48550/arXiv.2205.14135
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.14135 2022
-
[9]
Fu, Stefano Ermon, Atri Rudra, and Christopher Ré
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. 2023. Flash-Decoding for Long-Context Inference. https://crfm.stanford.edu/2023/10/ 12/flashdecoding.html
work page 2023
-
[10]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer
-
[11]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. arXiv:2208.07339 https://arxiv.org/abs/2208.07339
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314 https://arxiv. org/abs/2305.14314
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[13]
Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Ap- puswamy, and Dharmendra S. Modha. 2020. Learned Step Size Quantization. arXiv:1902.08153 https://arxiv.org/abs/1902.08153
-
[14]
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2023. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arXiv:2210.17323 https://arxiv.org/abs/2210.17323
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
Georgi Gerganov et al . 2026. llama.cpp: Inference of LLaMA models in pure C/C++. https://github.com/ggml-org/llama.cpp
work page 2026
-
[16]
Allen Gersho and Robert M. Gray. 1991.Vector Quantization and Signal Com- pression. Kluwer Academic Publishers
work page 1991
-
[17]
Google. 2026. Dawn: A WebGPU Implementation. https://dawn.googlesource. com/dawn
work page 2026
-
[18]
Google. 2026. Protocol Buffers Documentation. https://protobuf.dev/
work page 2026
-
[19]
Google Cloud. 2019. BFloat16: The Secret to High Performance on Cloud TPUs. https://cloud.google.com/blog/products/ai-machine-learning/bfloat16- the-secret-to-high-performance-on-cloud-tpus
work page 2019
-
[20]
Google DeepMind. 2026. Gemma 4 Model Card. https://ai.google.dev/gemma/ docs/core/model_card_4
work page 2026
-
[21]
Khronos Group. 2026. Vulkan 1.3 Specification. https://registry.khronos.org/ vulkan/specs/1.3-extensions/html/vkspec.html
work page 2026
-
[22]
Albert Gu and Tri Dao. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752 https://arxiv.org/abs/2312.00752
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. 2017. Bringing the web up to speed with WebAssembly.SIGPLAN Not.(2017). doi:10.1145/3140587. 3062363
-
[24]
Awni Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert. 2026. MLX: Efficient and Flexible Machine Learning on Apple Silicon. https://github. com/ml-explore
work page 2026
-
[25]
Hugging Face. 2026.Transformers.js. https://github.com/huggingface/ transformers.js
work page 2026
-
[26]
Hugging Face. 2026. Transformers.js Documentation. https://huggingface.co/ docs/transformers.js/index
work page 2026
-
[27]
Hugging Face. 2026.Transformers.js Examples. https://github.com/huggingface/ transformers.js-examples
work page 2026
-
[28]
Erik Johannes Husom, Arda Goknil, Merve Astekin, Lwin Khin Shar, Andre KÃ¥sen, Sagar Sen, Benedikt Andreas Mithassel, and Ahmet Soylu. 2025. Sus- tainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency.ACM Trans. Internet Things (2025). https://doi.org/10.1145/3767742
-
[29]
IBM. 2026. Granite Models Documentation. https://www.ibm.com/granite/docs/ models/granite
work page 2026
-
[30]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, An- drew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. arXiv:1712.05877 https://arxiv.org/abs/1712.05877
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [31]
-
[32]
Iwan Kawrakow. 2023. K-Quants. https://github.com/ggml-org/llama.cpp/pull/ 1684#issuecomment-2474462323. GitHub comment
work page 2023
-
[33]
Khronos Group. 2026.MoltenVK. https://github.com/KhronosGroup/MoltenVK Vulkan portability implementation over Apple’s Metal API. Accessed: 2026-05-11
work page 2026
-
[34]
Jennifer King, Kevin Klyman, Emily Capstick, Tiffany Saade, and Victoria Hsieh
-
[35]
arXiv:2509.05382 https://arxiv.org/abs/2509.05382
User Privacy and Large Language Models: An Analysis of Frontier Devel- opers’ Privacy Policies. arXiv:2509.05382 https://arxiv.org/abs/2509.05382
-
[36]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAtten- tion. InProceedings of the 29th Symposium on Operating Systems Principles. https://doi.org/10.1145/3600006.3613165
-
[37]
Reese Levine. 2026. PreWGSL: Universal preprocessor for WGSL shaders. https: //github.com/reeselevine/pre-wgsl
work page 2026
-
[38]
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2026. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv:2306.00978 https://arxiv.org/abs/2306.00978
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
Alexander H. Liu, Kartik Khandelwal, Sandeep Subramanian, Victor Jouault, Abhinav Rastogi, Adrien Sadé, Alan Jeffares, Albert Jiang, Alexandre Cahill, Alexandre Gavaudan, Alexandre Sablayrolles, Amélie Héliou, Amos You, Andy Ehrenberg, Andy Lo, Anton Eliseev, Antonia Calvi, Avinash Sooriyarachchi, Baptiste Bout, Baptiste Rozière, Baudouin De Monicault, Cl...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[40]
Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James W
Yang Liu, Wissam M. Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James W. Demmel, and Xiaoye S. Li. 2021. GPTune: multitask learning for auto- tuning exascale applications. InProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery. doi:10.1145/3437801.3441621
-
[41]
Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, and Matthias Grundmann. 2019. MediaPipe: A Framework for Building Perception Pipelines. arXiv:1906.08172 https://arxiv.org/abs/1906.08172
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[42]
Rust Graphics Mages. 2026. wgpu. https://github.com/gfx-rs/wgpu. 12 Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
work page 2026
-
[43]
Meta. 2024. Llama 3.2 Model Card. https://www.llama.com/docs/model-cards- and-prompt-formats/llama3_2/
work page 2024
-
[44]
Microsoft. 2026. DirectX Specifications. https://microsoft.github.io/DirectX- Specs/
work page 2026
-
[45]
Microsoft. 2026.ONNX Runtime. https://github.com/microsoft/onnxruntime
work page 2026
-
[46]
Microsoft. 2026. ONNX Runtime Web: Tutorials and Documentation. https: //onnxruntime.ai/docs/tutorials/web/
work page 2026
-
[47]
MLC AI. 2026. WebLLM Chat Demo. https://chat.webllm.ai/
work page 2026
- [48]
-
[49]
2026.Origin Private File System
Mozilla. 2026.Origin Private File System. https://developer.mozilla.org/en- US/docs/Web/API/File_System_API/Origin_private_file_system
work page 2026
-
[50]
Xuan-Son Nguyen. 2026. wllama: Run llama.cpp models in the browser. https: //github.com/ngxson/wllama
work page 2026
-
[51]
Cedric Nugteren. 2018. CLBlast: A Tuned OpenCL BLAS Library. InProceedings of the International Workshop on OpenCL (IWOCL ’18). ACM. doi:10.1145/3204919. 3204924
-
[52]
NVIDIA. 2025. Introducing NVFP4 for Efficient and Accurate Low-Precision Inference. https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient- and-accurate-low-precision-inference/
work page 2025
- [53]
-
[54]
Open Compute Project. 2023. OCP Microscaling Formats (MX) Specification Ver- sion 1.0. https://www.opencompute.org/documents/ocp-microscaling-formats- mx-v1-0-spec-final-pdf
work page 2023
-
[55]
OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Apple- baum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook, Marat Dukhan, Casey Dvorak, Kevin Fives,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
S. J. Pennycook, J. D. Sewall, and V. W. Lee. 2016. A Metric for Performance Portability. arXiv:1611.07409 https://arxiv.org/abs/1611.07409
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[57]
PrismML. 2025. 1-bit Bonsai 8B Whitepaper. https://github.com/PrismML- Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf. Technical report
work page 2025
-
[58]
Qwen Team. 2026. Qwen3.5-2B. https://huggingface.co/Qwen/Qwen3.5-2B
work page 2026
-
[59]
WebLLM: A High-Performance In-Browser LLM Inference Engine
Charlie F. Ruan, Yucheng Qin, Akaash R. Parthasarathy, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, and Tianqi Chen. 2026. WebLLM: A High- Performance In-Browser LLM Inference Engine. arXiv:2412.15803 https://arxiv. org/abs/2412.15803
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[60]
Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, and Christopher Ré. 2026. Intelligence per Watt: Measuring Intelligence Efficiency of Local AI. arXiv:2511.07885 htt...
-
[61]
Sha Sajadieh, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Lapo Santarlasci, Juan Pava, Nestor Maslej, Russ Altman, Erik Brynjolfsson, Carla Brodley, Jack Clark, Virginia Dignum, Vipin Kumar, James Landay, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Elham Tabassi, Russell Wald, Toby Walsh, and Dan Weld. 2026. The AI ...
work page 2026
-
[62]
Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieil- lard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas B...
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [63]
-
[64]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. InProceedings of the 31st International Conference on Neural Informa- tion Processing Systems (NIPS’17). https://doi.org/10.48550/arXiv.1706.03762
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2017
-
[65]
W3C. 2026. WebGPU. https://www.w3.org/TR/webgpu/
work page 2026
-
[66]
Conrad Watt, Andreas Rossberg, and Jean Pichon-Pharabod. 2019. Weakening WebAssembly.Proc. ACM Program. Lang.(2019). doi:10.1145/3360559
-
[67]
2026.WebKit: WebKit Browser Engine on GitHub
WebKit Contributors. 2026.WebKit: WebKit Browser Engine on GitHub. https: //github.com/WebKit/WebKit
work page 2026
- [68]
-
[69]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[70]
Alon Zakai. 2011. Emscripten: an LLVM-to-JavaScript compiler. InProceedings of the ACM International Conference Companion on Object Oriented Programming Systems Languages and Applications Companion. Association for Computing Machinery. doi:10.1145/2048147.2048224
-
[71]
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. arXiv:2307.13854 https://arxiv.org/abs/2307.13854 13 Reese Levine, Rithik Sharma, Nikhil Jain, Abhijit Ramesh, Zheyuan Chen...
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.