cuda-kernels

This project integrates a custom CUDA-based matrix multiplication kernel into a PyTorch deep learning model, leveraging GPU acceleration for matrix operations. The goal is to compare the performance of this custom kernel with PyTorch's built-in matrix multiplication and demonstrate how custom CUDA kernels can optimize compute-intensive operations.

cuda-kernels matmul

Updated Aug 26, 2024
Python

mrakgr / Spiral-s-ML-Library

Star

Spiral's Machine Learning Library

functional-programming game-development ml rl cuda-kernels cuda-programming

Updated Jun 8, 2025
Python

debashishc / kernelheim

Star

KernelHeim – development ground of custom Triton and CUDA kernel functions designed to optimize and accelerate machine learning workloads on NVIDIA GPUs. Inspired by the mythical stronghold of the gods, KernelHeim is a forge where high-performance kernels are crafted to unlock the full potential of the hardware.

cuda-kernels parallel-programming triton-kernels

Updated Jun 21, 2025
Python

qriosa / SParry

Star

SParry is a shortest path calculating Python tool using some algorithms with CUDA to speedup.

cuda cuda-kernels pycuda shortest-path-algorithm

Updated Sep 15, 2023
Python

lessw2020 / QuantFour_AdamW_Cuda

Star

Fused 4bit AdamW in Cuda

cuda cuda-kernels 4bit quantized optimizers

Updated Feb 20, 2024
Python

Rohanberiwal / GPU-programming-CUDA

Star

GPU porgamming CUDA is the repo that has all the list of my materials that I used for the CUDA . I learned CUDA myself and this material helped me get the basic strong .

cuda cuda-kernels cuda-toolkit cuda-programming

Updated Jun 18, 2024
Python

catalina17 / CIGAR

Star

CNNs for spectrogram-based music recommendation (Undergraduate dissertation)

python machine-learning information-retrieval deep-learning gpu classification cuda-kernels

Updated Sep 9, 2023
Python

xone4 / optimized-Mat-Mul-cuda-code

Star

The provided code is a Python script that uses the CuPy library to perform optimized GPU operations, specifically matrix multiplication. The script includes a custom CUDA kernel that is optimized for performance and energy consumption. The kernel uses half-precision floating-point numbers (float16) for improved performance and warp utilization.

optimization cuda-kernels matmul

Updated Oct 7, 2024
Python

Mazharuddin-Mohammed / QDSim

Star

High-performance 2D Quantum Dot (QD) Simulator implemented in C++ and Python

hpc quantum-mechanics cuda-kernels quantum-dot cuda-programming

Updated Jun 5, 2025
Python

liangyuwang / simple_cuda_kernel

Star

A collection of ultra-simple yet high-performance CUDA kernels.

cuda cuda-kernels llms llmops

Updated Apr 27, 2025
Python

Improve this page

Add a description, image, and links to the cuda-kernels topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cuda-kernels topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda-kernels

Here are 27 public repositories matching this topic...

InternLM / lmdeploy

KernelTuner / kernel_tuner

alexzhang13 / flashattention2-custom-mask

p-sto / ConjugateGradients

aredden / torch-bnb-fp4

oKatanaaa / CudaCythonSamples

matrix97317 / OneNeuralNetwork

LeanModels / LeanQuant

kovanostra / message-passing-neural-network

EmmittXu / Object-Tracking-PyCUDA

eduand-alvarez / CUDA_Custom_MatMul_Experiment

mrakgr / Spiral-s-ML-Library

debashishc / kernelheim

qriosa / SParry

lessw2020 / QuantFour_AdamW_Cuda

Rohanberiwal / GPU-programming-CUDA

catalina17 / CIGAR

xone4 / optimized-Mat-Mul-cuda-code

Mazharuddin-Mohammed / QDSim

liangyuwang / simple_cuda_kernel

Improve this page

Add this topic to your repo