GPUs are leaving performance on the table.
SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-05-27
SemiAnalysis highlights research by Mohamed Abdelfattah at Makora showing that auto-generated CUDA kernels are outperforming hand-tuned ones at scale, arguing GPUs are leaving significant performance on the table.
Appears in
Extraction
Topics: gpu-performancecuda-kernelskernel-optimizationauto-tuning
Claims
- GPUs are not achieving their theoretical peak performance in real-world production workloads.
- Auto-generated CUDA kernels are outperforming hand-written ones at scale.
- Closing the gap between theoretical peak throughput and real-world performance is nearly impossible with manual kernel tuning.
Key quotes
Closing the gap between theoretical peak and real-world throughput is nearly impossible when hand-tuning CUDA kernels at scale.