Quantization and training of neural networks for efficient integer-arithmetic-only inference

Jacob, B · 2018 · arXiv 2018.00286

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability

cs.LG · 2025-12-08 · unverdicted · novelty 6.0

HOLE applies persistent homology to latent embeddings in neural networks and uses visualizations such as cluster flow diagrams to reveal patterns of class separation, feature disentanglement, and robustness.

EmbeddingGemma: Powerful and Lightweight Text Representations

cs.CL · 2025-09-24 · unverdicted · novelty 6.0

A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

cs.CL · 2023-10-03 · conditional · novelty 6.0

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA

cs.LG · 2025-03-09 · unverdicted · novelty 5.0

FPGA hardware for event-graph NN achieves 92.7% accuracy on SHD dataset with fewer parameters than SOTA while outperforming prior FPGA SNNs.

Development of embedded target detection system based on FPGA and YOLOv3-Tiny

physics.chem-ph · 2026-05-07 · unverdicted · novelty 3.0

An FPGA implementation of quantized and fused YOLOv3-Tiny achieves 0.211 s latency and 10.11 GOPS/W efficiency with up to 51.94% lower resource utilization.

citing papers explorer

Showing 7 of 7 citing papers.

AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing cs.CV · 2026-05-20 · unverdicted · none · ref 81
The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.
Search Your Block Floating Point Scales! cs.LG · 2026-05-12 · unverdicted · none · ref 126
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability cs.LG · 2025-12-08 · unverdicted · none · ref 35
HOLE applies persistent homology to latent embeddings in neural networks and uses visualizations such as cluster flow diagrams to reveal patterns of class separation, feature disentanglement, and robustness.
EmbeddingGemma: Powerful and Lightweight Text Representations cs.CL · 2025-09-24 · unverdicted · none · ref 8
A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs cs.CL · 2023-10-03 · conditional · none · ref 36
FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.
Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA cs.LG · 2025-03-09 · unverdicted · none · ref 37
FPGA hardware for event-graph NN achieves 92.7% accuracy on SHD dataset with fewer parameters than SOTA while outperforming prior FPGA SNNs.
Development of embedded target detection system based on FPGA and YOLOv3-Tiny physics.chem-ph · 2026-05-07 · unverdicted · none · ref 9
An FPGA implementation of quantized and fused YOLOv3-Tiny achieves 0.211 s latency and 10.11 GOPS/W efficiency with up to 51.94% lower resource utilization.

Quantization and training of neural networks for efficient integer-arithmetic-only inference

fields

years

verdicts

representative citing papers

citing papers explorer