archive
Every paper Pith has read. Search by title, abstract, or pith.
89 papers in cs.OS · page 2
-
CPU-free LLM serving cuts P99 latency up to 8x
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC
-
Client scheduler hits 100% LLM deadlines at 4.2 requests per second
Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale
-
Nexus cuts serverless CPU use 44% by offloading I/O from VMs
Nexus: Transparent I/O Offloading for High-Density Serverless Computing
-
Scheduler cuts quantum queue times 30-75% at high load
Qurator: Scheduling Hybrid Quantum-Classical Workflows Across Heterogeneous Cloud Providers
-
Single GPU trains 120B-parameter models at full precision
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
-
Migratable actors on CXL SSDs dodge thermal cliffs
WIO: Upload-Enabled Computational Storage on CXL SSDs
-
Scheduler pointer faults crash FreeRTOS far more often than TCB changes
Experimental Analysis of FreeRTOS Dependability through Targeted Fault Injection Campaigns
-
CoGPU shares GPUs spatially with zero token drift
Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing
-
CATS transport cuts first paint time by 78% in worst-case web load
A Case for CATS: A Conductor-driven Asymmetric Transport Scheme for Semantic Prioritization
-
NCCLbpf adds verified eBPF policies to NCCL plugins with 130 ns overhead
NCCLbpf: Verified, Composable Policy Execution for GPU Collective Communication
-
Flexible mode switching speeds secure mobile LLM inference 10x
FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation
-
LLM agents run as native POSIX processes
Quine: Realizing LLM Agents as Native POSIX Processes
-
Unified objects automate IoT edge-cloud apps with 9 nines availability
EdgeWeaver: Accelerating IoT Application Development Across Edge-Cloud Continuum
-
TEE architecture secures continuous attestation against platform control
A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship Verification
-
Slack-tokenized Transformer meets more real-time deadlines
TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs
-
Graph engine keeps semantic state stable at microsecond speeds
The Compute ICE-AGE: Invariant Compute Envelope under Addressable Graph Evolution
-
Local generators keep update cost constant as system grows
Bounded Local Generator Classes for Deterministic State Evolution
-
The paper describes an integrated methodology combining hardware modeling
Interferences within a certifiable design methodology for high-performance multi-core platforms
-
Equilibria enforces CXL fairness and raises performance 52 percent
Equilibria: Fair Multi-Tenant CXL Memory Tiering At Scale
-
Original papers outperform tutorials for system design mastery
The Computer System Trail
-
Host RAM enables single-GPU training of 120B LLMs
Horizon-LM: A RAM-Centric Architecture for LLM Training
-
Beta metric delivers 96.5% optimal edge AI performance
Mitigating GIL Bottlenecks in Edge AI Systems
-
LLM agents finish over 80% of Rust system proofs
VeruSAGE: A Study of Agent-Based Verification for Rust Systems
-
Data movement bottlenecks sit outside the network core
Reexamining Paradigms of End-to-End Data Movement
-
CAEC lets secure VMs share memory without encryption
CAEC: Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA
-
KV cache TTL cuts multi-turn agent job times by over 8x
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live
-
Sockeye formalizes hardware manuals into provable security models
Sockeye: a language for analyzing hardware documentation
-
NetCAS boosts remote storage speed 174% via dynamic I/O splits
NetCAS: Dynamic Cache and Backend Device Management in Networked Environments
-
Tyche turns isolation into a composable cloud primitive
Tyche: Composable Isolation as a Foundation to Manage Trust in the Cloud
-
Best agents need 2.7-4.3x more steps than humans
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
-
90% of Linux radiation failures route through one eMMC path
Where Linux Breaks Under Radiation: A Cross-Architecture Kernel-Level Characterization of Proton-Induced Failures in COTS SoCs
-
Type-1 hypervisor matches Docker speed with stronger isolation
Goldilocks Isolation: High Performance VMs with Edera
-
Review yields security framework for software-defined vehicles
Contextualizing Security and Privacy of Software-Defined Vehicles: A Literature Review and Industry Perspectives
-
FPGA scheduler lifts fairness 24-98% by adding time and energy rules
THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs
-
CPU-time budgets isolate tail latency in shared datapaths
Tail Contagion: Sub-microsecond Time Protection in Shared Software Network Datapaths
-
New file system design reduces SSD write amplification without GC
SSDFS: Towards LFS Flash-Friendly File System without GC operation
-
Smoosh semantics matches POSIX standard more closely than seven shells
Executable formal semantics for the POSIX shell
-
DiOS guarantees identical traces for repeated POSIX program runs
Reproducible Execution of POSIX Programs with DiOS
-
Hardware scheduler delivers 12x speedup on accelerator systems
HTS: A Hardware Task Scheduler for Heterogeneous Systems
-
Lawn timer handles any time range at constant speed
Lawn: an Unbound Low Latency Timer Data Structure for Large Scale, High Throughput Systems
-
DMX keeps critical container performance stable as density rises
Container Density Improvements with Dynamic Memory Extension using NAND Flash
-
TrustZone world switches carry measurable time and energy costs
On The Performance of ARM TrustZone