Title resolution pending

· 2024 · arXiv 2402.18158

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.

Are Large Language Models Economically Viable for Industry Deployment?

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.

QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model

cs.CL · 2025-04-13 · unverdicted · novelty 4.0

QM-ToT applies Tree of Thoughts decomposition and evaluator layers to quantized LLMs, reporting accuracy gains from 34% to 50% on MedQAUSMLE for LLaMA2-70b and from 58.77% to 69.49% for LLaMA-3.1-8b, plus an 86.27% improvement in data distillation using only 3.9% of the data.

Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models

cs.CL · 2024-08-25 · unverdicted · novelty 4.0

GPT-4o and Claude 3.5 Sonnet reach 73.7-74% accuracy on gastroenterology questions; VLMs gain nothing from images and lose accuracy with LLM-generated captions.

Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models

cs.SE · 2024-11-16 · unverdicted · novelty 3.0

Smaller LLMs produce functional but limited Python code with variable quantization effects and quality/maintainability concerns that require validation before use.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

citing papers explorer

Showing 6 of 6 citing papers.

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization cs.CL · 2026-04-21 · unverdicted · none · ref 3
LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.
Are Large Language Models Economically Viable for Industry Deployment? cs.CL · 2026-04-21 · unverdicted · none · ref 53
Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model cs.CL · 2025-04-13 · unverdicted · none · ref 9
QM-ToT applies Tree of Thoughts decomposition and evaluator layers to quantized LLMs, reporting accuracy gains from 34% to 50% on MedQAUSMLE for LLaMA2-70b and from 58.77% to 69.49% for LLaMA-3.1-8b, plus an 86.27% improvement in data distillation using only 3.9% of the data.
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models cs.CL · 2024-08-25 · unverdicted · none · ref 31
GPT-4o and Claude 3.5 Sonnet reach 73.7-74% accuracy on gastroenterology questions; VLMs gain nothing from images and lose accuracy with LLM-generated captions.
Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models cs.SE · 2024-11-16 · unverdicted · none · ref 24
Smaller LLMs produce functional but limited Python code with variable quantization effects and quality/maintainability concerns that require validation before use.
A Survey on Efficient Inference for Large Language Models cs.CL · 2024-04-22 · accept · none · ref 214
The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer