archive
Every paper Pith has read. Search by title, abstract, or pith.
1164 papers in cs.DC · page 15
-
HistMSO logic expresses 39 of 42 consistency models
HistMSO: A Logic for Reasoning about Consistency Models with MONA
-
Pessimistic sync cuts redundant I/Os in disaggregated KV stores
CIDER: Boosting Memory-Disaggregated Key-Value Stores with Pessimistic Synchronization
-
Fixed gating stabilizes federated averaging of pretrained models
FedSQ: Optimized Weight Averaging via Fixed Gating
-
Sparsity scores cut multimodal LLM latency 30 percent
MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
-
Digital twin optimizes metaverse XR offloading and resources
Digital Twin-Assisted In-Network and Edge Collaboration for Joint User Association, Task Offloading, and Resource Allocation in the Metaverse
-
Temporal gating cuts edge-cloud video costs by up to 60%
R2E-VID: Two-Stage Robust Routing via Temporal Gating for Elastic Edge-Cloud Video Inference
-
Sketch-based GPU solver handles 5000-asset portfolios in seconds
Scalable Mean-Variance Portfolio Optimization via Subspace Embeddings and GPU-Friendly Nesterov-Accelerated Projected Gradient
-
Heterogeneous memory lets GPUs run large nonlinear simulations
Accelerating Nonlinear Time-History Analysis with Complex Constitutive Laws via Heterogeneous Memory Management: From 3D Seismic Simulation to Neural Network Training
-
Communication-free sampling scales GNN training to 2048 GPUs
Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training
-
Cold TLB misses slow small GPU collectives up to 1.4x
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods
-
DWDP lifts LLM output speed 8.8% per GPU by skipping rank sync
DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72
-
A probabilistic bin-packing method lets cloud schedulers overcommit VMs while bounding…
Hotspot-Aware Scheduling of Virtual Machines with Overcommitment for Ultimate Utilization in Cloud Datacenters
-
Multi-scale graphs improve microservice latency estimates
Scene-Aware Latency Estimation for Microservices via Multi-Scale Graph Fusion
-
Shared replicas run fine-tuning and inference together on edge GPUs
CoLLM: Continuous Adaptation for SLO-Aware LLM Serving on Shared GPU Clusters
-
Multi-agent LLM workflow maps service text to KVI intervals
KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions
-
Semantic triggers backdoor federated learning models
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning
-
Edge AI cuts sensor energy use via dynamic activation
An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence
-
Lumos captures bug provenance automatically for root cause ID
Wherefore Art Thou? Provenance-Guided Automatic Online Debugging with Lumos
-
GPU-FPGA pairing speeds LLM memory processing 2.2x
Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference
-
Block-wise FL improves multimodal results up to 37.7% under sparse modalities
BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed Modalities
-
Binary thresholds mark quantum advantage in sub-chips of two qubit technologies
Benchmarking Quantum Computers via Protocols, Comparing Superconducting and Ion-Trap Quantum Technology
-
Lossless compressor speeds Ascend NPU inference up to 6.3 times
ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs
-
Tiny fingerprint selects best cache policy for shifting workloads
SCION: Size-aware Policy Orchestration for Nonstationary Object Caches (Long Paper Version)
-
Scheduler cuts multimodal LLM first-token latency by 54%
TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference
-
NoC with direct core access speeds ML collectives 5.3x
A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML Accelerators
-
Network evolves protocols from intents into bytecode at runtime
DarwinNet: An Evolutionary Network Architecture for Agent-Driven Protocol Synthesis
-
Erasure coding reduces LLM checkpoint latency 2.7x
GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving
-
Data profiling cuts multimodal LLM training time up to 3.6x
DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization
-
Hybrid MPI+OpenMP scales PIC Monte Carlo to 16,000 GPUs
Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems
-
GPU framework speeds up graph edit distance by orders of magnitude
Efficient Accelerated Graph Edit Distance Computation on GPU
-
Lightning V2 achieves 4x lower TTS cost on Tenstorrent vs L40S
Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent Achieves 4x Lower Cost Than NVIDIA L40S
-
Reasoning provenance cannot be recovered from state checkpoints alone
Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces
-
Product graph proves livelock freedom for all ring sizes
Practical Livelock Analysis in Parameterized Unidirectional Rings
-
WRP matrix maps LLM optimizations to 3x3 grid
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
-
RoboECC splits VLA models for 3.28x edge-cloud speedup
RoboECC: Multi-Factor-Aware Edge-Cloud Collaborative Deployment for VLA Models
-
Updated Amdahl sets specialization threshold at 1-1/R
Modernizing Amdahl's Law: How AI Scaling Laws Shape Computer Architecture
-
Text-only supervision cannot enforce model honesty
Epistemic Observability in Language Models
-
Edge YOLO models keep hardware metrics stable under input faults
Hardware Utilization and Inference Performance of Edge Object Detection Under Fault Injection
-
YOLO edge inference holds steady hardware metrics under faults
Hardware Utilization and Inference Performance of Edge Object Detection Under Fault Injection
-
Training memory bounded to twice inference for geometric AI
Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI
-
Tokens per watt halves when context window doubles
The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
-
Structural monitoring signals catch quiet GPU detachments early
When GPUs Fail Quietly: Observability-Aware Early Warning Beyond Numeric Telemetry
-
Edge agents orchestrate smart homes with MQTT and Git
HearthNet: Edge Multi-Agent Orchestration for Smart Homes
-
CoGPU shares GPUs spatially with zero token drift
Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing
-
Twin-field QKD secures blockchain with linear scaling
Security-enhanced Blockchain with Twin-Field Quantum Key Distribution: A Physical Layer enabled Architecture
-
Tezos protocol embeds native liquid staking
Canonical LST: A Protocol-Native Liquid Staking Solution for Tezos
-
DCGen builds datacenter models with IT
DCGen 1.1 Technical Report: Generating Datacenter Configurations (including IT, Power, Cooling)
-
First CS research paper written entirely in Telugu
On the First Computer Science Research Paper in an Indian Language and the Future of Science in Indian Languages
-
CATS transport cuts first paint time by 78% in worst-case web load
A Case for CATS: A Conductor-driven Asymmetric Transport Scheme for Semantic Prioritization
-
Calibrated microgrid simulations match real node power to R^2 of 0.95
Calibrating Microgrid Simulations for Energy-Aware Computing Systems