Towards an Accurate GPU Data Race Detector
Pith reviewed 2026-05-13 21:20 UTC · model grok-4.3
The pith
Analyzing CPU launch code lets a static detector find every true GPU data race with zero false alarms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HGRD performs a holistic analysis of both CPU and GPU code to accurately detect a broad set of true races while minimizing false alarms. While SOTA dynamic techniques such as iGUARD miss many true races, HGRD misses none. On the other hand, static techniques such as GPUVerify and FaialAA raise tens of false alarms, where HGRD raises none.
What carries the argument
HGRD's holistic static analysis that extracts and applies semantic constraints on GPU kernel parameter values from CPU kernel-launch code.
Load-bearing premise
The CPU host code contains extractable semantic information about the values that GPU kernel parameters can actually take during execution.
What would settle it
A GPU program and its CPU launcher where HGRD reports a race that never occurs under any input or misses a race that does occur.
Figures
read the original abstract
Data races in GPU programs pose a threat to the reliability of GPU-accelerated software stacks. Prior works proposed various dynamic (runtime) and static (compile-time) techniques to detect races in GPU programs. However, dynamic techniques often miss critical races, as they require the races to manifest during testing. While static ones can catch such races, they often generate numerous false alarms by conservatively assuming values of variables/parameters that cannot ever occur during any execution of the program. We make a key observation that the host (CPU) code that launches GPU kernels contains crucial semantic information about the values that the GPU kernel's parameters can take during execution. Harnessing this hitherto overlooked information helps accurately detect data races in GPU kernel code. We create HGRD, a new state-of-the-art static analysis technique that performs a holistic analysis of both CPU and GPU code to accurately detect a broad set of true races while minimizing false alarms. While SOTA dynamic techniques, such as iGUARD, miss many true races, HGRD misses none. On the other hand, static techniques such as GPUVerify and FaialAA raise tens of false alarms, where HGRD raises none.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HGRD, a static analysis technique for GPU data-race detection that performs holistic analysis of both CPU host code (which launches kernels) and GPU kernel code. It claims this yields a broad set of true races with zero misses (unlike dynamic tools such as iGUARD) and zero false alarms (unlike static tools such as GPUVerify and FaialAA) by extracting precise constraints on kernel parameters from the host launch site.
Significance. If the central claim holds, the work would be a meaningful contribution to GPU program reliability: it directly addresses the well-known precision gap between dynamic (incomplete) and static (over-approximating) race detectors by exploiting host-kernel semantic linkage that prior GPU race-detection literature has largely ignored.
major comments (2)
- [Abstract and §3] Abstract and §3 (Host Analysis): the claim that host-code extraction is both sound and sufficiently precise to eliminate all false alarms while preserving every true race is load-bearing. The skeptic note correctly identifies that any over-approximation arising from loops, conditionals, or pointer arithmetic in the launcher will re-introduce spurious races; the manuscript must supply either a formal soundness argument or concrete evidence (e.g., the exact abstract domains used and their precision on the benchmark launch sites) that this does not occur for the evaluated programs.
- [§5] §5 (Evaluation): the abstract asserts that HGRD “misses none” and “raises none” while iGUARD misses many and GPUVerify/FaialAA raise tens of false alarms, yet no table, benchmark list, or quantitative breakdown (true-positive count, false-positive count, runtime, memory) is referenced. Without these data the central performance claim cannot be assessed.
minor comments (2)
- [§2] Clarify the exact race classes covered (intra-kernel only, inter-kernel, host-device, etc.) and whether the analysis is flow-, context-, or path-sensitive.
- [§6] Add a short related-work paragraph contrasting HGRD with recent hybrid CPU-GPU analyses (e.g., any post-2022 papers on host-kernel parameter inference).
Simulated Author's Rebuttal
Thank you for the detailed review and constructive feedback. We appreciate the identification of areas where the manuscript's claims require stronger support. We address each major comment below and will revise the paper to incorporate the requested clarifications and evidence.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Host Analysis): the claim that host-code extraction is both sound and sufficiently precise to eliminate all false alarms while preserving every true race is load-bearing. The skeptic note correctly identifies that any over-approximation arising from loops, conditionals, or pointer arithmetic in the launcher will re-introduce spurious races; the manuscript must supply either a formal soundness argument or concrete evidence (e.g., the exact abstract domains used and their precision on the benchmark launch sites) that this does not occur for the evaluated programs.
Authors: We agree that the soundness and precision of host-code extraction is central and load-bearing. The current manuscript sketches the analysis but lacks an explicit formal argument. In the revision we will add a dedicated subsection in §3 presenting a formal soundness proof based on the abstract interpretation framework used, specifying the domains (interval analysis augmented with congruence for kernel parameters, and a flow-sensitive points-to analysis for pointers). We will also include a table showing, for each benchmark launch site, the concrete constraints extracted versus the actual runtime values observed, demonstrating that loops and conditionals are handled without introducing over-approximation that creates false races. This evidence will be drawn from the same benchmark set used in the evaluation. revision: yes
-
Referee: [§5] §5 (Evaluation): the abstract asserts that HGRD “misses none” and “raises none” while iGUARD misses many and GPUVerify/FaialAA raise tens of false alarms, yet no table, benchmark list, or quantitative breakdown (true-positive count, false-positive count, runtime, memory) is referenced. Without these data the central performance claim cannot be assessed.
Authors: We acknowledge that the abstract claims are not explicitly tied to the evaluation data in the current draft. The manuscript does contain Table 1 (benchmarks and results) and Table 2 (runtime/memory), but the text in §5 and the abstract do not reference them directly when stating the “misses none / raises none” results. In the revision we will (1) add explicit cross-references from the abstract and §5 to these tables, (2) expand the text to report the exact true-positive and false-positive counts for each tool on every benchmark, and (3) include the missing per-benchmark breakdown of runtime and memory. This will make the quantitative support for the central claims immediately verifiable. revision: partial
Circularity Check
No circularity: HGRD derives from independent static analysis of CPU launch sites and GPU kernels
full rationale
The paper's central technique extracts constraints on kernel parameters from host CPU code and feeds them into a GPU race checker. This is a standard program-analysis composition with no fitted parameters, no self-definitional equations, and no load-bearing self-citations that reduce the result to its own inputs. The abstract and description present the method as a direct static analysis of existing program text; no step renames a known result or smuggles an ansatz via prior work by the same authors. The derivation remains self-contained against the semantics of the analyzed programs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Host CPU code provides semantic information on GPU kernel parameter values during execution
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HGRD analyzes the host code to extract five classes of semantic information... asserts, thread grid dimensions, relation among kernel parameters, loop bounds, allocation size
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HGRD invokes a SAT solver to check satisfiability of constraints for finding pairs of memory instructions that may race
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alfred V . Aho, Monica S. Lam, Ravi Sethi, and Jef- frey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Boston, MA, 2 edition, 2006
work page 2006
-
[2]
Amazon. P3 instances with v100. https://aws. amazon.com/ec2/instance-types/p3/, 2020
work page 2020
-
[3]
Don- aldson, Jeroen Ketema, Daniel Liew, and Shaz Qadeer
Ethel Bardsley, Adam Betts, Nathan Chong, Peter Collingbourne, Pantazis Deligiannis, Alastair F. Don- aldson, Jeroen Ketema, Daniel Liew, and Shaz Qadeer. Engineering a static verification tool for gpu kernels. In Armin Biere and Roderick Bloem, editors, Com- puter Aided Verification, pages 226–242, Cham, 2014. Springer International Publishing
work page 2014
- [4]
-
[5]
Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Charles E. Leiserson. On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs. In Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architec- tures, SPAA ’04, page 133–144, New York, NY , USA,
-
[7]
Gpuverify: a verifier for gpu kernels
Adam Betts, Nathan Chong, Alastair Donaldson, Shaz Qadeer, and Paul Thomson. Gpuverify: a verifier for gpu kernels. SIGPLAN Not., 47(10):113–132, October 2012
work page 2012
-
[8]
Michael D. Bond, Katherine E. Coons, and Kathryn S. McKinley. Pacer: proportional detection of data races. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, page 255–268, New York, NY , USA, 2010. Association for Computing Machinery
work page 2010
- [9]
-
[10]
Krishna, Andreas Pavlogiannis, and Omkar Tuppe
Soham Chakraborty, S. Krishna, Andreas Pavlogiannis, and Omkar Tuppe. Gpumc: A stateless model checker for gpu weak memory concurrency. In Computer Aided Verification: 37th International Conference, CAV 2025, Zagreb, Croatia, July 23-25, 2025, Proceedings, Part III, page 321–346, Berlin, Heidelberg, 2025. Springer- Verlag
work page 2025
-
[11]
Sheaffer, Sang-Ha Lee, and Kevin Skadron
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous comput- ing. In 2009 IEEE International Symposium on Work- load Characterization (IISWC), 2009
work page 2009
-
[12]
Checking data-race free- dom of gpu kernels, compositionally
Tiago Cogumbreiro, Julien Lange, Dennis Liew Zhen Rong, and Hannah Zicarelli. Checking data-race free- dom of gpu kernels, compositionally. In Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part I, page 403–426, Berlin, Heidelberg, 2021. Springer- Verlag
work page 2021
-
[13]
Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy. A simple, fast dominance algorithm, 2001. Technical report, Rice University; widely cited implementation- oriented treatment
work page 2001
-
[14]
Verifying multi- threaded software using smt-based context-bounded model checking
Lucas Cordeiro and Bernd Fischer. Verifying multi- threaded software using smt-based context-bounded model checking. In Proceedings of the 33rd Interna- tional Conference on Software Engineering, ICSE ’11, 13 page 331–340, New York, NY , USA, 2011. Association for Computing Machinery
work page 2011
-
[15]
Leonardo De Moura and Nikolaj Bjørner. Z3: an ef- ficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analy- sis of Systems, TACAS’08/ETAPS’08, page 337–340, Berlin, Heidelberg, 2008. Springer-Verlag
work page 2008
-
[16]
Laura Effinger-Dean, Brandon Lucia, Luis Ceze, Dan Grossman, and Hans-J. Boehm. Ifrit: interference-free regions for dynamic data-race detection. In Proceed- ings of the ACM International Conference on Object Oriented Programming Systems Languages and Appli- cations, OOPSLA ’12, page 467–484, New York, NY , USA, 2012. Association for Computing Machinery
work page 2012
-
[17]
Barracuda: binary-level analysis of runtime races in cuda programs
Ariel Eizenberg, Yuanfeng Peng, Toma Pigli, William Mansky, and Joseph Devietti. Barracuda: binary-level analysis of runtime races in cuda programs. In Pro- ceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation , PLDI 2017, page 126–140, New York, NY , USA, 2017. Association for Computing Machinery
work page 2017
-
[18]
Cormac Flanagan and Stephen N. Freund. Fasttrack: efficient and precise dynamic race detection. In Pro- ceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation , PLDI ’09, page 121–133, New York, NY , USA, 2009. Association for Computing Machinery
work page 2009
-
[19]
Scoped racey (scor) benchmark suite.https:// github.com/csl-iisc/ScoR/tree/master, 2020
Github. Scoped racey (scor) benchmark suite.https:// github.com/csl-iisc/ScoR/tree/master, 2020
work page 2020
- [20]
-
[21]
Haccrg: Hardware-accelerated data race detection in gpus
Anup Holey, Vineeth Mekkat, and Antonia Zhai. Haccrg: Hardware-accelerated data race detection in gpus. In Proceedings of the 2013 42nd International Conference on Parallel Processing, ICPP ’13, page 60–69, USA,
work page 2013
-
[22]
IEEE Computer Society
-
[23]
IBM. Expression trees. https://www. ibm.com/docs/en/zos/3.1.0?topic= example-expression-trees, 2023
work page 2023
-
[24]
Hirace: Accurate and fast data race checking for gpu programs
John Jacobson, Martin Burtscher, and Ganesh Gopalakr- ishnan. Hirace: Accurate and fast data race checking for gpu programs. In Proceedings of the International Con- ference for High Performance Computing, Networking, Storage, and Analysis, SC ’24. IEEE Press, 2024
work page 2024
-
[25]
Zheming Jin and Jeffrey S. Vetter. A benchmark suite for improving performance portability of the sycl pro- gramming model. In 2023 IEEE International Sympo- sium on Performance Analysis of Systems and Software (ISPASS), pages 325–327, 2023
work page 2023
-
[26]
Aditya K. Kamath and Arkaprava Basu. iguard: In- gpu advanced race detection. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, SOSP ’21, page 49–65, New York, NY , USA,
-
[27]
Association for Computing Machinery
-
[28]
Aditya K. Kamath, Alvin A. George, and Arkaprava Basu. Scord: a scoped race detector for gpus. In Pro- ceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture, ISCA ’20, page 1036–1049. IEEE Press, 2020
work page 2020
-
[29]
David Koester and Bob Lucas. Random ac- cess. https://icl.utk.edu/projectsfiles/hpcc/ RandomAccess/
-
[30]
Truly stateless, optimal dynamic partial order reduction
Michalis Kokologiannakis, Iason Marmanis, Vladimir Gladstein, and Viktor Vafeiadis. Truly stateless, optimal dynamic partial order reduction. Proc. ACM Program. Lang., 6(POPL), January 2022
work page 2022
-
[31]
Mlir: scaling compiler infrastructure for do- main specific computation
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Ta- tiana Shpeisman, Nicolas Vasilache, and Oleksandr Zi- nenko. Mlir: scaling compiler infrastructure for do- main specific computation. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Genera- tion and Optimization, CGO ’21, page 2–14...
work page 2021
-
[32]
Thomas Lengauer and Robert E. Tarjan. A fast algo- rithm for finding dominators in a flowgraph. ACM Transactions on Programming Languages and Systems, 1(1):121–141, 1979
work page 1979
-
[33]
Scalable smt- based verification of gpu kernel functions
Guodong Li and Ganesh Gopalakrishnan. Scalable smt- based verification of gpu kernel functions. In Proceed- ings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering , FSE ’10, page 187–196, New York, NY , USA, 2010. Association for Computing Machinery
work page 2010
-
[34]
Guodong Li, Peng Li, Geof Sawaya, Ganesh Gopalakr- ishnan, Indradeep Ghosh, and Sreeranga P. Rajan. Gklee: concolic verification and test generation for gpus. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , PPoPP ’12, page 215–224, New York, NY , USA, 2012. Association for Computing Machinery. 14
work page 2012
-
[35]
Pengcheng Li, Xiaoyu Hu, Dong Chen, Jacob Brock, Hao Luo, Eddy Z. Zhang, and Chen Ding. Ld: Low- overhead gpu race detection without access monitoring. ACM Trans. Archit. Code Optim., 14(1), March 2017
work page 2017
- [36]
-
[37]
Sound and partially-complete static analysis of data- races in gpu programs
Dennis Liew, Tiago Cogumbreiro, and Julien Lange. Sound and partially-complete static analysis of data- races in gpu programs. Proc. ACM Program. Lang. , 8(OOPSLA2), October 2024
work page 2024
-
[38]
The indigo program-verification microbench- mark suite of irregular parallel code patterns
Yiqian Liu, Noushin Azami, Corbin Walters, and Martin Burtscher. The indigo program-verification microbench- mark suite of irregular parallel code patterns. In 2022 IEEE International Symposium on Performance Anal- ysis of Systems and Software (ISPASS) , pages 24–34, 2022
work page 2022
-
[39]
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. SIGOPS Oper. Syst. Rev., 42(2):329–339, March 2008
work page 2008
-
[40]
Avio: detecting atomicity violations via access interleav- ing invariants
Shan Lu, Joseph Tucek, Feng Qin, and Yuanyuan Zhou. Avio: detecting atomicity violations via access interleav- ing invariants. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, page 37–48, New York, NY , USA, 2006. Association for Com- puting Machinery
work page 2006
-
[41]
Mlir mlir::dominanceinfo class reference
LLVM MLIR. Mlir mlir::dominanceinfo class reference. https://mlir.llvm.org/doxygen/classmlir_1_ 1DominanceInfo.html, 2025
work page 2025
-
[42]
Mlir mlir::value class reference
LLVM MLIR. Mlir mlir::value class reference. https://mlir.llvm.org/doxygen/classmlir_1_ 1Value.html, 2025
work page 2025
-
[43]
Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko
William S. Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. Polygeist: Raising c to polyhe- dral mlir. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, PACT ’21, New York, NY , USA, 2021. As- sociation for Computing Machinery
work page 2021
-
[44]
Over-synchronization in gpu programs
Ajay Nayak and Arkaprava Basu. Over-synchronization in gpu programs. In 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 795– 809, 2024
work page 2024
-
[45]
NVIDIA. Gpus everywhere. https: //blogs.nvidia.com/blog/2017/05/08/ microsoft-azure-gpu-instances/ , 2019
work page 2017
-
[46]
NVIDIA. Compute sanitizer. https://docs.nvidia. com/compute-sanitizer/ComputeSanitizer/ index.html, 2025
work page 2025
-
[47]
NVIDIA. Cuda c++ programming guide. https://docs.nvidia.com/cuda/ cuda-c-programming-guide/ , 2025
work page 2025
- [48]
-
[49]
Tuning cuda applications for volta
NVIDIA. Tuning cuda applications for volta. https://docs.nvidia.com/cuda/ volta-tuning-guide/index.html, 2025
work page 2025
-
[50]
Curd: a dynamic cuda race detector
Yuanfeng Peng, Vinod Grover, and Joseph Devietti. Curd: a dynamic cuda race detector. SIGPLAN Not., 53(4):390–403, June 2018
work page 2018
-
[51]
Laurent Perron and Frédéric Didier. Cp-sat. https://developers.google.com/optimization/ cp/cp_solver/, 2024
work page 2024
-
[52]
The pytorch-kaldi speech recognition toolkit
Mirco Ravanelli, Titouan Parcollet, and Yoshua Ben- gio. The pytorch-kaldi speech recognition toolkit. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6465–6469, 2019
work page 2019
-
[53]
Eraser: a dynamic data race detector for multithreaded programs
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst., 15(4):391–411, November 1997
work page 1997
-
[54]
Threadsanitizer: data race detection in practice
Konstantin Serebryany and Timur Iskhodzhanov. Threadsanitizer: data race detection in practice. In Pro- ceedings of the Workshop on Binary Instrumentation and Applications, WBIA ’09, page 62–71, New York, NY , USA, 2009. Association for Computing Machinery
work page 2009
-
[55]
Tyler Sorensen. Gklee tests. https://github. com/Geof23/Gklee/blob/master/Gklee/CUDA/ Benchmarks/Misc_Test/Tyler, 2013
work page 2013
-
[56]
Towards unified analysis of gpu consistency
Haining Tong, Natalia Gavrilenko, Hernan Ponce de Leon, and Keijo Heljanko. Towards unified analysis of gpu consistency. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Vol- ume 4, ASPLOS ’24, page 329–344, New York, NY , USA, 2025. Association for Computing Machinery
work page 2025
-
[57]
Haris V olos, Andres Jaan Tack, Michael M. Swift, and Shan Lu. Applying transactional memory to concur- rency bugs. SIGPLAN Not. , 47(4):211–222, March 2012. 15
work page 2012
-
[58]
Wood, Luis Ceze, and Dan Grossman
Benjamin P. Wood, Luis Ceze, and Dan Grossman. Low- level detection of language-level data races with lard. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, page 671–686, New York, NY , USA, 2014. Association for Computing Machinery
work page 2014
-
[59]
Simulee: detecting cuda synchronization bugs via memory-access modeling
Mingyuan Wu, Yicheng Ouyang, Husheng Zhou, Ling- ming Zhang, Cong Liu, and Yuqun Zhang. Simulee: detecting cuda synchronization bugs via memory-access modeling. In Proceedings of the ACM/IEEE 42nd In- ternational Conference on Software Engineering, ICSE ’20, page 937–948, New York, NY , USA, 2020. Associ- ation for Computing Machinery
work page 2020
-
[60]
Conseq: detecting concurrency bugs through sequential errors
Wei Zhang, Junghee Lim, Ramya Olichandran, Joel Scherpelz, Guoliang Jin, Shan Lu, and Thomas Reps. Conseq: detecting concurrency bugs through sequential errors. SIGPLAN Not., 46(3):251–264, March 2011
work page 2011
-
[61]
Ravi, Feng Qin, and Gagan Agrawal
Mai Zheng, Vignesh T. Ravi, Feng Qin, and Gagan Agrawal. Grace: a low-overhead mechanism for de- tecting data races in gpu programs. SIGPLAN Not., 46(8):135–146, February 2011
work page 2011
-
[62]
Ravi, Feng Qin, and Gagan Agrawal
Mai Zheng, Vignesh T. Ravi, Feng Qin, and Gagan Agrawal. Gmrace: Detecting data races in gpu pro- grams via a low-overhead scheme. IEEE Trans. Parallel Distrib. Syst., 25(1):104–115, January 2014
work page 2014
-
[63]
Hard: Hardware-assisted lockset-based race detection
Pin Zhou, Radu Teodorescu, and Yuanyuan Zhou. Hard: Hardware-assisted lockset-based race detection. In 2007 IEEE 13th International Symposium on High Perfor- mance Computer Architecture, pages 121–132, 2007. A Supporting Scoped Operations A.1 Background on scopes Synchronizing across all the threads of a GPU kernel is slow and often unnecessary due to the...
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.