A hybrid JIT-CUDA Graph framework reduces TTFT by up to 66% and P99 latency versus TensorRT-LLM for single-GPU LLaMA-2 7B inference on short prompts.
Attention is all you need,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
DA-MSDL maintains predictive performance on drifting multivariate time series by detecting distribution shifts without labels and adapting via prioritized replay and hierarchical fine-tuning.
citing papers explorer
-
Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference
A hybrid JIT-CUDA Graph framework reduces TTFT by up to 66% and P99 latency versus TensorRT-LLM for single-GPU LLaMA-2 7B inference on short prompts.
-
Drift-Aware Online Dynamic Learning for Nonstationary Multivariate Time Series: Application to Sintering Quality Prediction
DA-MSDL maintains predictive performance on drifting multivariate time series by detecting distribution shifts without labels and adapting via prioritized replay and hierarchical fine-tuning.