![Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs - Dominik Ernst, Georg Hager, Jonas Thies, Gerhard Wellein, 2021 Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs - Dominik Ernst, Georg Hager, Jonas Thies, Gerhard Wellein, 2021](https://journals.sagepub.com/cms/10.1177/1094342020965661/asset/images/large/10.1177_1094342020965661-fig2.jpeg)
Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs - Dominik Ernst, Georg Hager, Jonas Thies, Gerhard Wellein, 2021
The performance of our cache-efficient matrix multiplication algorithm... | Download Scientific Diagram
![Underfox on Twitter: "In this paper, researchers have proposed a new bit data format for efficient design of Bit-Matrix-Multiplication and Bit-Convolution in Tensor Cores in NVIDIA Turing GPUs. https://t.co/aozfwOsbxy https://t.co/NUxjo2rvdT" / Twitter Underfox on Twitter: "In this paper, researchers have proposed a new bit data format for efficient design of Bit-Matrix-Multiplication and Bit-Convolution in Tensor Cores in NVIDIA Turing GPUs. https://t.co/aozfwOsbxy https://t.co/NUxjo2rvdT" / Twitter](https://pbs.twimg.com/media/Eb8BqTPXgAc4dRx.png)
Underfox on Twitter: "In this paper, researchers have proposed a new bit data format for efficient design of Bit-Matrix-Multiplication and Bit-Convolution in Tensor Cores in NVIDIA Turing GPUs. https://t.co/aozfwOsbxy https://t.co/NUxjo2rvdT" / Twitter
![GPU computing performance analysis on matrix multiplication - Huang - 2019 - The Journal of Engineering - Wiley Online Library GPU computing performance analysis on matrix multiplication - Huang - 2019 - The Journal of Engineering - Wiley Online Library](https://ietresearch.onlinelibrary.wiley.com/cms/asset/fb2451f8-c958-47a3-a516-dfa289dbbadc/tje2bf02890-fig-0012-m.jpg)
GPU computing performance analysis on matrix multiplication - Huang - 2019 - The Journal of Engineering - Wiley Online Library
![Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores | NVIDIA Technical Blog Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores | NVIDIA Technical Blog](https://developer-blogs.nvidia.com/wp-content/uploads/2021/03/GEMM.png)
Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores | NVIDIA Technical Blog
![Computation | Free Full-Text | Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU Computation | Free Full-Text | Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU](https://www.mdpi.com/computation/computation-06-00045/article_deploy/html/images/computation-06-00045-g001.png)
Computation | Free Full-Text | Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU
Fast and Memory Efficient Strassen's Matrix Multiplication on GPU Cluster - Spectrum: Concordia University Research Repository
![PDF] Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs | Semantic Scholar PDF] Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs | Semantic Scholar](https://d3i71xaburhd42.cloudfront.net/47243ec8bf2774cbb8f3fa08270aceac33eb5fbb/9-Figure10-1.png)
PDF] Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs | Semantic Scholar
![PDF] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format | Semantic Scholar PDF] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format | Semantic Scholar](https://d3i71xaburhd42.cloudfront.net/3452abaaeade5dff673f1b305686dc00b7ba4528/2-Figure1-1.png)
PDF] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format | Semantic Scholar
![PDF] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data | Semantic Scholar PDF] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data | Semantic Scholar](https://d3i71xaburhd42.cloudfront.net/d03695d3b8db00933696c4a58217de62b3df2ef7/10-Figure5-1.png)