Abstract: Accelerating matrix multiplication is crucial to achieve high performance in many application domains, including neural networks, graph analytics, and scientific computing. These ...
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...
To set up Python environment, install the libraries specified in pyproject.toml. If you are Rye user, you can run rye sync to set up the environment. We developed a C++ extension for the event data ...
Abstract: An improved variant of the precise-integration time-domain (PITD) method is proposed to eliminate the inverse matrix calculation and optimize the storage burden with the help of sparse ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results