The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.
Quantization and training of neural networks for efficient integer-arithmetic-only inference
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
HOLE applies persistent homology to latent embeddings in neural networks and uses visualizations such as cluster flow diagrams to reveal patterns of class separation, feature disentanglement, and robustness.
A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.
FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.
FPGA hardware for event-graph NN achieves 92.7% accuracy on SHD dataset with fewer parameters than SOTA while outperforming prior FPGA SNNs.
An FPGA implementation of quantized and fused YOLOv3-Tiny achieves 0.211 s latency and 10.11 GOPS/W efficiency with up to 51.94% lower resource utilization.
citing papers explorer
-
AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing
The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.
-
Search Your Block Floating Point Scales!
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
-
HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability
HOLE applies persistent homology to latent embeddings in neural networks and uses visualizations such as cluster flow diagrams to reveal patterns of class separation, feature disentanglement, and robustness.
-
EmbeddingGemma: Powerful and Lightweight Text Representations
A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.
-
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.
-
Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA
FPGA hardware for event-graph NN achieves 92.7% accuracy on SHD dataset with fewer parameters than SOTA while outperforming prior FPGA SNNs.
-
Development of embedded target detection system based on FPGA and YOLOv3-Tiny
An FPGA implementation of quantized and fused YOLOv3-Tiny achieves 0.211 s latency and 10.11 GOPS/W efficiency with up to 51.94% lower resource utilization.